Statistical Simulation of Superscalar Architectures using Commercial Workloads
|
|
- Baldric Simon
- 5 years ago
- Views:
Transcription
1 Statistical Simulation of Superscalar Architectures using Commercial Workloads Lieven Eeckhout and Koen De Bosschere Dept. of Electronics and Information Systems (ELIS) Ghent University, Belgium CAECW 01, January 21, 2001
2 Outline Introduction Statistical Simulation Statistical profiling Synthetic trace generation Methodology Evaluation Conclusion 2
3 Introduction Architectural simulation trace-driven or execution-driven accurate long simulation times long traces to be stored Need for fast simulation techniques take part of a full trace analytical modeling trace sampling statistical simulation 3
4 Goal Previous work used SPEC benchmarks to evaluate statistical simulation In this talk we use both commercial and scientific workloads SPECint, SPECfp, system traces, multimedia, X graphics, database 4
5 Statistical Simulation Three steps: extract statistical profile from a program execution generate synthetic trace from it simulate on a trace-driven simulator Two major advantages: statistical profile is more compact than full trace fast simulation due to statistical nature design space exploration in limited time 5
6 Statistical Simulation real trace (e.g. SPEC benchmark) branch profiling branch statistics cache profiling cache statistics instruction profiling instruction statistics statistical profile synthetic trace generator synthetic trace trace-driven simulator 6
7 Statistical Profiling Microarchitecture-independent statistics instruction statistics Microarchitecture-dependent statistics branch statistics cache statistics Result: statistical simulation only to explore design options of processor core (cache and branch predictor are fixed) 7
8 Statistical Profiling Instruction Statistics Instruction mix (13 classes) Number of register operands Age of register operands probability that register operand was produced δ instructions before it in the trace (only RAW) Memory dependencies probability that load is memory-dependent on the δ-th store before it in the trace (only RAW) 8
9 Statistical Profiling Branch Statistics Six branch types conditional branch, unconditional branch, call with offset, indirect jump, indirect call, return Distinction branch prediction accuracy: refill pipeline on branch misprediction branch target prediction accuracy: singlecycle bubble in pipeline on correct branch prediction but target misprediction 9
10 Statistical Profiling Cache Statistics D-cache statistics L1 D-cache miss rate L2 D-cache miss rate I-cache statistics L1 I-cache miss rate L2 I-cache miss rate 10
11 Synthetic Trace Generation st Instruction-by-instruction through random number generation add ld br I-cache miss D-cache miss mispredicted Determine instruction type number of operands age of register operands memory dependency branch behavior D-cache behavior I-cache behavior 11
12 Methodology: microarchitecture Out-of-order processor 8 and 16 issue windows of 64 and 128 instructions McFarling branch predictor small cache configuration 8KB DM L1 I-cache, 8KB DM L1 D-cache, 64KB 2WSA unified L2 cache large cache configuration 32KB DM L1 I-cache, 64KB 2WSA L1 D-cache, 512KB 4WSA unified L2 cache Access time L1 I-cache (1 cycle), L1 D-cache (2 cycles), L2 cache (10 cycles), main memory (80 cycles) 12
13 Methodology: benchmarks 8 SPECint95 benchmarks 5 SPECfp95 benchmarks (hydro2d, su2cor, swim, tomcatv, wave5) 8 IBS system traces (mpeg, jpeg, gs, verilog, gcc, sdet, nroff, groff) 4 MediaBench applications (g721, gs, gsm, mpeg2) 4 X graphics benchmarks (DooM, POVRay, Xanim, Quake) 2 TPC-D queries running on Postgres 6.3 ~ 200 million instructions / trace 13
14 IPC prediction error = Evaluation IPC real trace - IPC synthetic trace IPC real trace IPC real trace = IPC when running real trace on trace-driven simulator IPC synthetic trace = IPC when running synthetic trace generated from the statistical profile of the real trace Simulation speed: s IPC /x IPC less than 1% after simulating 1 million instructions 14
15 IPC prediction error (1) IPC prediction error 40% 30% 20% 10% 0% -10% -20% -30% li gcc compress go ijpeg vortex m88ksim perl 157% 135% high D-cache miss rate hydro2d su2cor swim tomcatv wave5 mpeg jpeg gs verilog real_gcc sdet nroff groff g721_e gs gsm_e mpeg2 xanim xdoom xpovray xquake tpc-d.17 tpc-d.2 SPECint95 SPECfp95 IBS MediaBench X graphics TPC-D 16-issue, 128-entry window, small cache configuration 15
16 IPC prediction error (2) 30% IPC prediction error 20% 10% 0% -10% -20% -30% li gcc compress go ijpeg vortex m88ksim perl hydro2d su2cor swim tomcatv wave5 mpeg jpeg gs verilog real_gcc sdet nroff groff g721_e gs gsm_e mpeg2 xanim xdoom xpovray xquake tpc-d.17 tpc-d.2 SPECint95 SPECfp95 IBS MediaBench X graphics TPC-D 16-issue, 128-entry window, large cache configuration 16
17 IPC prediction error vs. static instruction count 160% w = 64; i = 8; 'small' cache 140% w = 128; i = 16; 'small' cache 120% w = 64; i = 8; 'large' cache IPC prediction error 100% 80% 60% 40% 20% nroff w = 128; i = 16; 'large' cache jpeg (IBS) verilog DooM mpeg (IBS) sdet groff Quake gs (IBS) gcc 0% -20% -40% vortex go TPC-D gcc (IBS) static instruction count (number of instructions executed at least once) 17
18 Conclusion (1) Higher IPC prediction errors for applications with smaller static instruction count: MediaBench applications SPECfp95 benchmarks 2 X graphics benchmarks (POVRay and Xanim) 5 SPECint95 benchmarks 18
19 Conclusion (2) Smaller IPC prediction errors for applications with larger instruction footprint: IBS system traces TPC-D traces 2 X graphics benchmarks (DooM and Quake) 3 SPECint95 benchmarks (go, gcc, vortex) IPC prediction error between -1% and 25% 19
20 Conclusion (3) Statistical simulation is a useful fast simulation technique for commercial workloads due to higher variability in instructions since commercial workloads have larger instruction footprint which makes a statistical technique more powerful 20
Annotated Memory References: A Mechanism for Informed Cache Management
Annotated Memory References: A Mechanism for Informed Cache Management Alvin R. Lebeck, David R. Raymond, Chia-Lin Yang Mithuna S. Thottethodi Department of Computer Science, Duke University http://www.cs.duke.edu/ari/ice
More informationComputer Science 146. Computer Architecture
Computer Architecture Spring 2004 Harvard University Instructor: Prof. dbrooks@eecs.harvard.edu Lecture 9: Limits of ILP, Case Studies Lecture Outline Speculative Execution Implementing Precise Interrupts
More informationPowerPC 620 Case Study
Chapter 6: The PowerPC 60 Modern Processor Design: Fundamentals of Superscalar Processors PowerPC 60 Case Study First-generation out-of-order processor Developed as part of Apple-IBM-Motorola alliance
More informationComputer Performance Evaluation: Cycles Per Instruction (CPI)
Computer Performance Evaluation: Cycles Per Instruction (CPI) Most computers run synchronously utilizing a CPU clock running at a constant clock rate: where: Clock rate = 1 / clock cycle A computer machine
More informationAnnouncements. ECE4750/CS4420 Computer Architecture L10: Branch Prediction. Edward Suh Computer Systems Laboratory
ECE4750/CS4420 Computer Architecture L10: Branch Prediction Edward Suh Computer Systems Laboratory suh@csl.cornell.edu Announcements Lab2 and prelim grades Back to the regular office hours 2 1 Overview
More informationTDT 4260 lecture 7 spring semester 2015
1 TDT 4260 lecture 7 spring semester 2015 Lasse Natvig, The CARD group Dept. of computer & information science NTNU 2 Lecture overview Repetition Superscalar processor (out-of-order) Dependencies/forwarding
More informationMultiple Branch and Block Prediction
Multiple Branch and Block Prediction Steven Wallace and Nader Bagherzadeh Department of Electrical and Computer Engineering University of California, Irvine Irvine, CA 92697 swallace@ece.uci.edu, nader@ece.uci.edu
More informationAllocation By Conflict: A Simple, Effective Cache Management Scheme
Allocation By Conflict: A Simple, Effective Cache Management Scheme Edward S. Tam, Gary S. Tyson, and Edward S. Davidson Advanced Computer Architecture Laboratory The University of Michigan {estam,tyson,davidson}@eecs.umich.edu
More informationPERFORMANCE ANALYSIS THROUGH SYNTHETIC TRACE GENERATION
PERFORMANCE ANALYSIS THROUGH SYNTHETIC TRACE GENERATION Lieven Eeckhout Koen De Bosschere Henk Neefs Department of Electronics and Information Systems (ELIS), Ghent University Sint-Pietersnieuwstraat 41,
More informationMeasuring Program Similarity
Measuring Program Similarity Aashish Phansalkar, Ajay Joshi, Lieven Eeckhout, and Lizy K. John {aashish, ajoshi, ljohn}@ece.utexas.edu, leeckhou@elis.ugent.be University of Texas, Austin Ghent University,
More informationReplenishing the Microarchitecture Treasure Chest. CMuART Members
Replenishing the Microarchitecture Treasure Chest Prof. John Paul Shen Electrical and Computer Engineering Department University UT Austin -- Distinguished Lecture Series on Computer Architecture -- April,
More informationGetting CPI under 1: Outline
CMSC 411 Computer Systems Architecture Lecture 12 Instruction Level Parallelism 5 (Improving CPI) Getting CPI under 1: Outline More ILP VLIW branch target buffer return address predictor superscalar more
More informationEvaluation of a High Performance Code Compression Method
Evaluation of a High Performance Code Compression Method Charles Lefurgy, Eva Piccininni, and Trevor Mudge Advanced Computer Architecture Laboratory Electrical Engineering and Computer Science Dept. The
More informationReducing Branch Costs via Branch Alignment
Reducing Branch Costs via Branch Alignment Brad Calder and Dirk Grunwald Department of Computer Science Campus Box 30 University of Colorado Boulder, CO 80309 030 USA fcalder,grunwaldg@cs.colorado.edu
More informationDynamic Hardware Prediction. Basic Branch Prediction Buffers. N-bit Branch Prediction Buffers
Dynamic Hardware Prediction Importance of control dependences Branches and jumps are frequent Limiting factor as ILP increases (Amdahl s law) Schemes to attack control dependences Static Basic (stall the
More informationDESIGNING a microprocessor is extremely time consuming
IEEE TRANSACTIONS ON COMPUTERS, VOL. 57, NO. 1, JANUARY 2008 41 Memory Data Flow Modeling in Statistical Simulation for the Efficient Exploration of Microprocessor Design Spaces Davy Genbrugge and Lieven
More informationChapter-5 Memory Hierarchy Design
Chapter-5 Memory Hierarchy Design Unlimited amount of fast memory - Economical solution is memory hierarchy - Locality - Cost performance Principle of locality - most programs do not access all code or
More informationCS252 S05. Outline. Dynamic Branch Prediction. Static Branch Prediction. Dynamic Branch Prediction. Dynamic Branch Prediction
Outline CMSC Computer Systems Architecture Lecture 9 Instruction Level Parallelism (Static & Dynamic Branch ion) ILP Compiler techniques to increase ILP Loop Unrolling Static Branch ion Dynamic Branch
More informationLecture 5: VLIW, Software Pipelining, and Limits to ILP. Review: Tomasulo
Lecture 5: VLIW, Software Pipelining, and Limits to ILP Professor David A. Patterson Computer Science 252 Spring 1998 DAP.F96 1 Review: Tomasulo Prevents Register as bottleneck Avoids WAR, WAW hazards
More informationMultithreading Processors and Static Optimization Review. Adapted from Bhuyan, Patterson, Eggers, probably others
Multithreading Processors and Static Optimization Review Adapted from Bhuyan, Patterson, Eggers, probably others Schedule of things to do By Wednesday the 9 th at 9pm Please send a milestone report (as
More informationMeasure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding effects of underlying architecture
Chapter 2 Note: The slides being presented represent a mix. Some are created by Mark Franklin, Washington University in St. Louis, Dept. of CSE. Many are taken from the Patterson & Hennessy book, Computer
More informationMultiple-Banked Register File Architectures
Multiple-Banked Register File Architectures José-Lorenzo Cruz, Antonio González and Mateo Valero Nigel P. Topham Departament d Arquitectura de Computadors Siroyan Ltd Universitat Politècnica de Catalunya
More informationTrade-offs for Skewed-Associative Caches
1 Trade-offs for kewed-associative Caches Hans andierendonck and Koen De Bosschere Dept. of Electronics and Information ystems, Ghent University int-pietersnieuwstraat 1, 9000 Gent, Belgium. The skewed-associative
More informationApplication Domains for Fixed-Length Block Structured Architectures
Application Domains for Fixed-Length Block Structured Architectures Lieven Eeckhoutt Tom Vander Aal Bart Goemant Hans Vandierendonckt Rudy Lauwereind Koen De Bosscherel telis, Ghent University, Belgium
More informationPredict Not Taken. Revisiting Branch Hazard Solutions. Filling the delay slot (e.g., in the compiler) Delayed Branch
branch taken Revisiting Branch Hazard Solutions Stall Predict Not Taken Predict Taken Branch Delay Slot Branch I+1 I+2 I+3 Predict Not Taken branch not taken Branch I+1 IF (bubble) (bubble) (bubble) (bubble)
More informationAdministrivia. CMSC 411 Computer Systems Architecture Lecture 14 Instruction Level Parallelism (cont.) Control Dependencies
Administrivia CMSC 411 Computer Systems Architecture Lecture 14 Instruction Level Parallelism (cont.) HW #3, on memory hierarchy, due Tuesday Continue reading Chapter 3 of H&P Alan Sussman als@cs.umd.edu
More informationThe Split Spatial/Non-Spatial Cache: A Performance and Complexity Evaluation
Chronologically: Paper #2 The Split Spatial/Non-Spatial Cache: A Performance and Complexity Evaluation 0LORã 3UYXORYLü 'DUNR 0DULQRY =RUDQ 'LPLWULMHYLü 9HOMNR 0LOXWLQRYLü Department of Computer Engineering
More informationComputing Along the Critical Path
Computing Along the Critical Path Dean M. Tullsen Brad Calder Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92093-0114 UCSD Technical Report October 1998
More informationAn Analysis of the Amount of Global Level Redundant Computation in the SPEC 95 and SPEC 2000 Benchmarks
An Analysis of the Amount of Global Level Redundant Computation in the SPEC 95 and SPEC 2000 s Joshua J. Yi and David J. Lilja Department of Electrical and Computer Engineering Minnesota Supercomputing
More informationtaken taken not-taken secondary branch). primary branch
An Exploration of Instruction Fetch Requirement in Out-of-Order Superscalar Processors Pierre Michaud, Andre Seznec, Stephan Jourdan Abstract The eective performance of wide-issue superscalar processors
More informationLecture 5: VLIW, Software Pipelining, and Limits to ILP Professor David A. Patterson Computer Science 252 Spring 1998
Lecture 5: VLIW, Software Pipelining, and Limits to ILP Professor David A. Patterson Computer Science 252 Spring 1998 DAP.F96 1 Review: Tomasulo Prevents Register as bottleneck Avoids WAR, WAW hazards
More informationpredicted address tag prev_addr stride state effective address correct incorrect weakly predict predict correct correct incorrect correct
Data Dependence Speculation using Data Address Prediction and its Enhancement with Instruction Reissue Toshinori Sato Toshiba Microelectronics Engineering Laboratory 580-1, Horikawa-Cho, Saiwai-Ku, Kawasaki
More informationPower/Performance Advantages of Victim Buer in. High-Performance Processors. R. Iris Bahar y. y Brown University. Division of Engineering.
Power/Performance Advantages of Victim Buer in High-Performance Processors Gianluca Albera xy x Politecnico di Torino Dip. di Automatica e Informatica Torino, ITALY 10129 R. Iris Bahar y y Brown University
More informationBloom Filtering Cache Misses for Accurate Data Speculation and Prefetching
Bloom Filtering Cache Misses for Accurate Data Speculation and Prefetching Jih-Kwon Peir, Shih-Chang Lai, Shih-Lien Lu, Jared Stark, Konrad Lai peir@cise.ufl.edu Computer & Information Science and Engineering
More informationSpeculative Multithreaded Processors
Guri Sohi and Amir Roth Computer Sciences Department University of Wisconsin-Madison utline Trends and their implications Workloads for future processors Program parallelization and speculative threads
More informationA Low Energy Set-Associative I-Cache with Extended BTB
A Low Energy Set-Associative I-Cache with Extended BTB Koji Inoue, Vasily G. Moshnyaga Dept. of Elec. Eng. and Computer Science Fukuoka University 8-19-1 Nanakuma, Jonan-ku, Fukuoka 814-0180 JAPAN {inoue,
More informationCMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading)
CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) Limits to ILP Conflicting studies of amount of ILP Benchmarks» vectorized Fortran FP vs. integer
More informationOverview of Today s Lecture: Cost & Price, Performance { 1+ Administrative Matters Finish Lecture1 Cost and Price Add/Drop - See me after class
Overview of Today s Lecture: Cost & Price, Performance EE176-SJSU Computer Architecture and Organization Lecture 2 Administrative Matters Finish Lecture1 Cost and Price Add/Drop - See me after class EE176
More informationExecution-based Prediction Using Speculative Slices
Execution-based Prediction Using Speculative Slices Craig Zilles and Guri Sohi University of Wisconsin - Madison International Symposium on Computer Architecture July, 2001 The Problem Two major barriers
More informationEric Rotenberg Karthik Sundaramoorthy, Zach Purser
Karthik Sundaramoorthy, Zach Purser Dept. of Electrical and Computer Engineering North Carolina State University http://www.tinker.ncsu.edu/ericro ericro@ece.ncsu.edu Many means to an end Program is merely
More informationComplex Pipelining: Out-of-order Execution & Register Renaming. Multiple Function Units
6823, L14--1 Complex Pipelining: Out-of-order Execution & Register Renaming Laboratory for Computer Science MIT http://wwwcsglcsmitedu/6823 Multiple Function Units 6823, L14--2 ALU Mem IF ID Issue WB Fadd
More informationThe Von Neumann Computer Model
The Von Neumann Computer Model Partitioning of the computing engine into components: Central Processing Unit (CPU): Control Unit (instruction decode, sequencing of operations), Datapath (registers, arithmetic
More informationI-CACHE INSTRUCTION FETCHER. PREFETCH BUFFER FIFO n INSTRUCTION DECODER
Modeled and Measured Instruction Fetching Performance for Superscalar Microprocessors Steven Wallace and Nader Bagherzadeh Department of Electrical and Computer Engineering University of California, Irvine
More informationWalking Four Machines by the Shore
Walking Four Machines by the Shore Anastassia Ailamaki www.cs.cmu.edu/~natassa with Mark Hill and David DeWitt University of Wisconsin - Madison Workloads on Modern Platforms Cycles per instruction 3.0
More informationControl Speculation in Multithreaded Processors through Dynamic Loop Detection
Control Speculation in Multithreaded Processors through Dynamic Loop Detection Jordi Tubella and Antonio González Departament d Arquitectura de Computadors Universitat Politècnica de Catalunya, Campus
More informationIntroduction. Introduction. Motivation. Main Contributions. Issue Logic - Motivation. Power- and Performance -Aware Architectures.
Introduction Power- and Performance -Aware Architectures PhD. candidate: Ramon Canal Corretger Advisors: Antonio onzález Colás (UPC) James E. Smith (U. Wisconsin-Madison) Departament d Arquitectura de
More informationCS 252 Graduate Computer Architecture. Lecture 4: Instruction-Level Parallelism
CS 252 Graduate Computer Architecture Lecture 4: Instruction-Level Parallelism Krste Asanovic Electrical Engineering and Computer Sciences University of California, Berkeley http://wwweecsberkeleyedu/~krste
More informationCost Effective Memory Dependence Prediction using Speculation Levels and Color Sets Λ
Cost Effective Memory Dependence Prediction using Speculation Levels and Color Sets Λ Soner Önder Department of Computer Science Michigan Technological University Houghton, MI 49931-1295 fsoner@mtu.edug
More informationEvaluation of Existing Architectures in IRAM Systems
Evaluation of Existing Architectures in IRAM Systems Ngeci Bowman, Neal Cardwell, Christoforos E. Kozyrakis, Cynthia Romer and Helen Wang Computer Science Division University of California Berkeley fbowman,neal,kozyraki,cromer,helenjwg@cs.berkeley.edu
More informationReducing the SPEC2006 Benchmark Suite for Simulation Based Computer Architecture Research
Reducing the SPEC2006 Benchmark Suite for Simulation Based Computer Architecture Research Joel Hestness jthestness@uwalumni.com Lenni Kuff lskuff@uwalumni.com Computer Science Department University of
More informationArchitecture-Conscious Database Systems
Architecture-Conscious Database Systems Anastassia Ailamaki Ph.D. Examination November 30, 2000 A DBMS on a 1980 Computer DBMS Execution PROCESSOR 10 cycles/instruction DBMS Data and Instructions 6 cycles
More informationThe Limits of Speculative Trace Reuse on Deeply Pipelined Processors
The Limits of Speculative Trace Reuse on Deeply Pipelined Processors Maurício L. Pilla, Philippe O. A. Navaux Computer Science Institute UFRGS, Brazil fpilla,navauxg@inf.ufrgs.br Amarildo T. da Costa IME,
More informationDistilling the Essence of Proprietary Workloads into Miniature Benchmarks
Distilling the Essence of Proprietary Workloads into Miniature Benchmarks AJAY JOSHI University of Texas at Austin LIEVEN EECKHOUT Ghent University ROBERT H. BELL JR. IBM, Austin and LIZY K. JOHN University
More informationHybrid Analytical-Statistical Modeling for Efficiently Exploring Architecture and Workload Design Spaces
Hybrid Analytical-Statistical Modeling for Efficiently Exploring Architecture and Workload Design Spaces Lieven Eeckhout Koen De Bosschere Department of Electronics and Information Systems (ELIS) Ghent
More informationTransient-Fault Recovery Using Simultaneous Multithreading
To appear in Proceedings of the International Symposium on ComputerArchitecture (ISCA), May 2002. Transient-Fault Recovery Using Simultaneous Multithreading T. N. Vijaykumar, Irith Pomeranz, and Karl Cheng
More informationInherently Lower Complexity Architectures using Dynamic Optimization. Michael Gschwind Erik Altman
Inherently Lower Complexity Architectures using Dynamic Optimization Michael Gschwind Erik Altman ÿþýüûúùúüø öõôóüòñõñ ðïîüíñóöñð What is the Problem? Out of order superscalars achieve high performance....butatthecostofhighhigh
More informationCPU Performance Evaluation: Cycles Per Instruction (CPI) Most computers run synchronously utilizing a CPU clock running at a constant clock rate:
CPI CPU Performance Evaluation: Cycles Per Instruction (CPI) Most computers run synchronously utilizing a CPU clock running at a constant clock rate: Clock cycle where: Clock rate = 1 / clock cycle f =
More informationThe Impact of Instruction Compression on I-cache Performance
Technical Report CSE-TR--97, University of Michigan The Impact of Instruction Compression on I-cache Performance I-Cheng K. Chen Peter L. Bird Trevor Mudge EECS Department University of Michigan {icheng,pbird,tnm}@eecs.umich.edu
More informationONE-CYCLE ZERO-OFFSET LOADS
ONE-CYCLE ZERO-OFFSET LOADS E. MORANCHO, J.M. LLABERÍA, À. OLVÉ, M. JMÉNEZ Departament d'arquitectura de Computadors Universitat Politècnica de Catnya enricm@ac.upc.es Fax: +34-93 401 70 55 Abstract Memory
More informationAnalysis of Branch Prediction via Data Compression
Appears in the proceedings of the 7th Int. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS VII), Cambridge, MA, Oct. 1996, pp. 128-137. Analysis of Branch Prediction
More informationSelective Dual Path Execution
Selective Dual Path Execution Timothy H. Heil James. E. Smith Department of Electrical and Computer Engineering University of Wisconsin - Madison November 8, 1996 Abstract Selective Dual Path Execution
More informationA Study of Control Independence in Superscalar Processors
A Study of Control Independence in Superscalar Processors Eric Rotenberg, Quinn Jacobson, Jim Smith University of Wisconsin - Madison ericro@cs.wisc.edu, {qjacobso, jes}@ece.wisc.edu Abstract An instruction
More informationImproving Value Prediction by Exploiting Both Operand and Output Value Locality. Abstract
Improving Value Prediction by Exploiting Both Operand and Output Value Locality Youngsoo Choi 1, Joshua J. Yi 2, Jian Huang 3, David J. Lilja 2 1 - Department of Computer Science and Engineering 2 - Department
More informationTarget Prediction for Indirect Jumps
Target Prediction for ndirect Jumps Po-Yung Chang Eric Hao Yale N. Patt Department of Electrical Engineering and Computer Science The University of Michigan Ann Arbor, Michigan 09- email: {pychang,ehao,patt}@eecs.umich.edu
More informationA Study for Branch Predictors to Alleviate the Aliasing Problem
A Study for Branch Predictors to Alleviate the Aliasing Problem Tieling Xie, Robert Evans, and Yul Chu Electrical and Computer Engineering Department Mississippi State University chu@ece.msstate.edu Abstract
More informationInstruction Level Parallelism (ILP)
1 / 26 Instruction Level Parallelism (ILP) ILP: The simultaneous execution of multiple instructions from a program. While pipelining is a form of ILP, the general application of ILP goes much further into
More informationSpeculative Execution via Address Prediction and Data Prefetching
Speculative Execution via Address Prediction and Data Prefetching José González, Antonio González Departament d Arquitectura de Computadors Universitat Politècnica de Catalunya Barcelona, Spain Email:
More informationPerformance Cloning: A Technique for Disseminating Proprietary Applications as Benchmarks
Performance Cloning: Technique for isseminating Proprietary pplications as enchmarks jay Joshi (University of Texas) Lieven Eeckhout (Ghent University, elgium) Robert H. ell Jr. (IM Corp.) Lizy John (University
More informationExecution-based Scheduling for VLIW Architectures. Kemal Ebcioglu Erik R. Altman (Presenter) Sumedh Sathaye Michael Gschwind
Execution-based Scheduling for VLIW Architectures Kemal Ebcioglu Erik R. Altman (Presenter) Sumedh Sathaye Michael Gschwind September 2, 1999 Outline Overview What's new? Results Conclusions Overview Based
More informationMethod-Level Phase Behavior in Java Workloads
Method-Level Phase Behavior in Java Workloads Andy Georges, Dries Buytaert, Lieven Eeckhout and Koen De Bosschere Ghent University Presented by Bruno Dufour dufour@cs.rutgers.edu Rutgers University DCS
More informationEITF20: Computer Architecture Part3.2.1: Pipeline - 3
EITF20: Computer Architecture Part3.2.1: Pipeline - 3 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Dynamic scheduling - Tomasulo Superscalar, VLIW Speculation ILP limitations What we have done
More informationChip-Multithreading Systems Need A New Operating Systems Scheduler
Chip-Multithreading Systems Need A New Operating Systems Scheduler Alexandra Fedorova Christopher Small Daniel Nussbaum Margo Seltzer Harvard University, Sun Microsystems Sun Microsystems Sun Microsystems
More informationLecture-13 (ROB and Multi-threading) CS422-Spring
Lecture-13 (ROB and Multi-threading) CS422-Spring 2018 Biswa@CSE-IITK Cycle 62 (Scoreboard) vs 57 in Tomasulo Instruction status: Read Exec Write Exec Write Instruction j k Issue Oper Comp Result Issue
More informationA Cost-Effective Clustered Architecture
A Cost-Effective Clustered Architecture Ramon Canal, Joan-Manuel Parcerisa, Antonio González Departament d Arquitectura de Computadors Universitat Politècnica de Catalunya Cr. Jordi Girona, - Mòdul D6
More informationPerformance of tournament predictors In the last lecture, we saw the design of the tournament predictor used by the Alpha
Performance of tournament predictors In the last lecture, we saw the design of the tournament predictor used by the Alpha 21264. The Alpha s predictor is very successful. On the SPECfp 95 benchmarks, there
More informationData-flow prescheduling for large instruction windows in out-of-order processors. Pierre Michaud, André Seznec IRISA / INRIA January 2001
Data-flow prescheduling for large instruction windows in out-of-order processors Pierre Michaud, André Seznec IRISA / INRIA January 2001 2 Introduction Context: dynamic instruction scheduling in out-oforder
More informationLecture 3: Evaluating Computer Architectures. How to design something:
Lecture 3: Evaluating Computer Architectures Announcements - (none) Last Time constraints imposed by technology Computer elements Circuits and timing Today Performance analysis Amdahl s Law Performance
More informationFocalizing Dynamic Value Prediction to CPU s Context
Lucian Vintan, Adrian Florea, Arpad Gellert, Focalising Dynamic Value Prediction to CPUs Context, IEE Proceedings - Computers and Digital Techniques, Volume 152, Issue 4 (July), p. 457-536, ISSN: 1350-2387,
More informationBanked Multiported Register Files for High-Frequency Superscalar Microprocessors
Banked Multiported Register Files for High-Frequency Superscalar Microprocessors Jessica H. T seng and Krste Asanoviü MIT Laboratory for Computer Science, Cambridge, MA 02139, USA ISCA2003 1 Motivation
More informationOutline EEL 5764 Graduate Computer Architecture. Chapter 3 Limits to ILP and Simultaneous Multithreading. Overcoming Limits - What do we need??
Outline EEL 7 Graduate Computer Architecture Chapter 3 Limits to ILP and Simultaneous Multithreading! Limits to ILP! Thread Level Parallelism! Multithreading! Simultaneous Multithreading Ann Gordon-Ross
More informationPower and Performance Tradeoffs using Various Caching Strategies
Power and Performance Tradeoffs using Various Caching Strategies y Brown University Division of Engineering Providence, RI 02912 R. Iris Bahar y Gianluca Albera xy Srilatha Manne z x Politecnico di Torino
More informationStatistical Simulation of Chip Multiprocessors Running Multi-Program Workloads
Statistical Simulation of Chip Multiprocessors Running Multi-Program Workloads Davy Genbrugge Lieven Eeckhout ELIS Depment, Ghent University, Belgium Email: {dgenbrug,leeckhou}@elis.ugent.be Abstract This
More informationAn Architectural Characterization Study of Data Mining and Bioinformatics Workloads
An Architectural Characterization Study of Data Mining and Bioinformatics Workloads Berkin Ozisikyilmaz Ramanathan Narayanan Gokhan Memik Alok ChoudharyC Department of Electrical Engineering and Computer
More informationMulti-Level Cache Hierarchy Evaluation for Programmable Media Processors. Overview
Multi-Level Cache Hierarchy Evaluation for Programmable Media Processors Jason Fritts Assistant Professor Department of Computer Science Co-Author: Prof. Wayne Wolf Overview Why Programmable Media Processors?
More informationWrong Path Events and Their Application to Early Misprediction Detection and Recovery
Wrong Path Events and Their Application to Early Misprediction Detection and Recovery David N. Armstrong Hyesoon Kim Onur Mutlu Yale N. Patt University of Texas at Austin Motivation Branch predictors are
More informationMicroarchitecture Overview. Performance
Microarchitecture Overview Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 18, 2005 Performance 4 Make operations faster Process improvements Circuit improvements Use more transistors to make
More informationCISC 662 Graduate Computer Architecture Lecture 18 - Cache Performance. Why More on Memory Hierarchy?
CISC 662 Graduate Computer Architecture Lecture 18 - Cache Performance Michela Taufer Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer Architecture, 4th edition ---- Additional
More informationDynamic Branch Prediction for a VLIW Processor
Dynamic Branch Prediction for a VLIW Processor Jan Hoogerbrugge Philips Research Laboratories, Prof. Holstlaan 4, 5656 AA Eindhoven, The Netherlands Abstract This paper describes the design of a dynamic
More informationOne-Level Cache Memory Design for Scalable SMT Architectures
One-Level Cache Design for Scalable SMT Architectures Muhamed F. Mudawar and John R. Wani Computer Science Department The American University in Cairo mudawwar@aucegypt.edu rubena@aucegypt.edu Abstract
More informationThe Use of Multithreading for Exception Handling
The Use of Multithreading for Exception Handling Craig Zilles, Joel Emer*, Guri Sohi University of Wisconsin - Madison *Compaq - Alpha Development Group International Symposium on Microarchitecture - 32
More informationThe Performance Potential of Value and. Dependence Prediction. Mikko H. Lipasti and John P. Shen. Department of Electrical and Computer Engineering
The Performance Potential of Value and Dependence Prediction Mikko H. Lipasti and John P. Shen Department of Electrical and Computer Engineering Carnegie Mellon University, Pittsburgh PA, 15213 Abstract.
More informationEI 338: Computer Systems Engineering (Operating Systems & Computer Architecture)
EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture) Dept. of Computer Science & Engineering Chentao Wu wuct@cs.sjtu.edu.cn Download lectures ftp://public.sjtu.edu.cn User:
More informationNOW Handout Page # Why More on Memory Hierarchy? CISC 662 Graduate Computer Architecture Lecture 18 - Cache Performance
CISC 66 Graduate Computer Architecture Lecture 8 - Cache Performance Michela Taufer Performance Why More on Memory Hierarchy?,,, Processor Memory Processor-Memory Performance Gap Growing Powerpoint Lecture
More informationCS61C Performance. Lecture 23. April 21, 1999 Dave Patterson (http.cs.berkeley.edu/~patterson)
cs 61C L23 performance.1 CS61C Performance Lecture 23 April 21, 1999 Dave Patterson (http.cs.berkeley.edu/~patterson) www-inst.eecs.berkeley.edu/~cs61c/schedule.html Outline Review HP-PA, Intel 80x86 instruction
More informationCISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP
CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer
More informationPerformance Cloning: A Technique for Disseminating Proprietary Applications as Benchmarks
Performance Cloning: A Technique for Disseminating Proprietary Applications as Benchmarks Ajay Joshi 1, Lieven Eeckhout 2, Robert H. Bell Jr. 3, and Lizy John 1 1 - Department of Electrical and Computer
More informationIntroduction to Pipelined Datapath
14:332:331 Computer Architecture and Assembly Language Week 12 Introduction to Pipelined Datapath [Adapted from Dave Patterson s UCB CS152 slides and Mary Jane Irwin s PSU CSE331 slides] 331 W12.1 Review:
More informationImproving Value Prediction by Exploiting Both Operand and Output Value Locality
Improving Value Prediction by Exploiting Both Operand and Output Value Locality Jian Huang and Youngsoo Choi Department of Computer Science and Engineering Minnesota Supercomputing Institute University
More informationUG4 Honours project selection: Talk to Vijay or Boris if interested in computer architecture projects
Announcements UG4 Honours project selection: Talk to Vijay or Boris if interested in computer architecture projects Inf3 Computer Architecture - 2017-2018 1 Last time: Tomasulo s Algorithm Inf3 Computer
More informationCharacterization of Silent Stores
Characterization of Silent Stores Gordon B.Bell Kevin M. Lepak Mikko H. Lipasti University of Wisconsin Madison http://www.ece.wisc.edu/~pharm Background Lepak, Lipasti: On the Value Locality of Store
More information