APPENDIX Summary of Benchmarks

Size: px
Start display at page:

Download "APPENDIX Summary of Benchmarks"

Transcription

1 158 APPENDIX Summary of Benchmarks The experimental results presented throughout this thesis use programs from four benchmark suites: Cyclone benchmarks (available from [Cyc]): programs used to evaluate the Cyclone tool [Jim + 02]. The ones we tested are small but computationally intensive applications that make heavy use of arrays and pointers. aes: Rijndael block-cipher encryption. cacm: Adaptive arithmetic coding for data compression. cfrac: Continued fraction algorithm. grobner: Gröbner bases computation. matxmult: Matrix multiplication. ppm: Arithmetic encoding and decoding. tile: Text document partitioning into tiles. Olden benchmarks (available from [Olden]): programs used to evaluate the Olden C compiler [Car + 95]. These are relatively small programs that each perform a monolithic task, using a variety of dynamically allocated data structures. bh: Barnes-Hut N-body force-computation algorithm; uses a heterogeneous octree. bisort: Forward and backward sort of integers using 2 disjoint bitonic sequences that are merged to obtain the sorted result; uses a binary tree.

2 159 em3d: Electromagnetic wave propagation in a 3D object; uses singly-linked lists. health: Columbian healthcare simulation; uses doubly-linked lists. mst: Minimum spanning tree of a graph; uses an array of singly-linked lists. perimeter: Perimeters of regions in images; uses a quad-tree. power: Power pricing system optimization problem solver; uses an N-way tree and singly-linked lists. treeadd: Recursive sum of values in a balanced B-tree. tsp: Traveling-salesman-problem solver using a partitioning algorithm and a closest point heuristic; uses a balanced binary tree. Spec CPU95 [SPEC]: includes all the C programs from the integer (CINT) suite. compress: An in-memory version of the common UNIX utility. gcc: Based on the GNU C compiler version go: An internationally ranked go-playing program. ijpeg: Image compression/decompression on in-memory images. li: Xlisp interpreter. m88ksim: A chip simulator for the Motorola microprocessor. perl: An interpreter for the Perl language. vortex: An object oriented database. Spec CPU2000 [SPEC]: includes select C programs from both the integer (CINT) and floating point (CFP) suites. ammp (CFP): Computational chemistry. art (CFP): Image recognition / neural networks. bzip2: Compression.

3 160 crafty: Game playing: chess. equake (CFP): Seismic wave propagation simulation. gap: Group theory, interpreter. gzip: Compression. mcf: Combinatorial optimization. mesa (CFP): 3-D graphics library. parser: Word processing. twolf: Place and route simulator. vpr: FPGA circuit placement and routing. Tables A.1 and A.2 list the programs used in our experiments, along with their size (in lines of code), baseline compilation time (wallclock time, in seconds), and baseline execution times (wallclock time, in seconds). Inputs were selected to give reasonable running times for comparison: for the Cyclone and Olden benchmarks, the inputs we used are listed in Table A.1 (either command-line arguments, or input files supplied with the benchmarks). For the SPEC benchmarks, we used two different datasets, which we call the slow and fast datasets. The slow dataset is the ref set for Spec 95 and the train set for Spec 2000, and is used to evaluate the more efficient Memory-Safety Enforcer (MSE) and Sensitive Location Checker (SLC), in Chapters 3 7. The fast dataset is the train set for Spec 95 and the test set for Spec 2000, and is used to evaluate the slower Runtime Type Checker (RTC) in Chapters In Figure A.2, columns (c) and (d) give the baseline execution times for the slow and fast datasets respectively. The programs were compiled with gcc (version 3.3.2) and executed on a 1GHz Pentium III with 512MB RAM, running Linux (RedHat 9). For the MSE and SLC experiments, programs were compiled with -O3 optimizations, while for the RTC experiments, optimizations were disabled (-O0) because they slowed down compilation time considerably, and we felt that the typical usage of the RTC as a debugging tool would be to compile programs without optimization.

4 161 Compile Exec LOC Time (s) Time (s) Input Program (a) (b) (c) (d) Cyclone aes 1, cacm encode test2 cfrac 4, grobner 4, eg03 matxmult 1, ppm 1, decode test1 tile 4, sample2 Olden bisort em3d health mst perimeter power treeadd tsp Table A.1 Benchmark Information

5 162 Compile Exec Time (s) LOC Time (s) slow fast Program (a) (b) (c) (d) Spec 95 (ref) (train) compress 3, gcc 205, go 29, ijpeg 31, li 7, m88ksim 19, perl 26, vortex 67, Spec 2000 (train) (test) ammp 13, art 1, bzip2 4, crafty 20, equake 1, gap 71, gzip 8, mcf 2, mesa 58, parser 11, twolf 20, vpr 17, Table A.2 Benchmark Information

Chapter 10. Improving the Runtime Type Checker Type-Flow Analysis

Chapter 10. Improving the Runtime Type Checker Type-Flow Analysis 122 Chapter 10 Improving the Runtime Type Checker The runtime overhead of the unoptimized RTC is quite high, because it instruments every use of a memory location in the program and tags every user-defined

More information

Putting the Checks into Checked C. Archibald Samuel Elliott Quals - 31st Oct 2017

Putting the Checks into Checked C. Archibald Samuel Elliott Quals - 31st Oct 2017 Putting the Checks into Checked C Archibald Samuel Elliott Quals - 31st Oct 2017 Added Runtime Bounds Checks to the Checked C Compiler 2 C Extension for Spatial Memory Safety Added Runtime Bounds Checks

More information

Computer System. Performance

Computer System. Performance Computer System Performance Virendra Singh Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/

More information

Which is the best? Measuring & Improving Performance (if planes were computers...) An architecture example

Which is the best? Measuring & Improving Performance (if planes were computers...) An architecture example 1 Which is the best? 2 Lecture 05 Performance Metrics and Benchmarking 3 Measuring & Improving Performance (if planes were computers...) Plane People Range (miles) Speed (mph) Avg. Cost (millions) Passenger*Miles

More information

Baggy bounds checking. Periklis Akri5dis, Manuel Costa, Miguel Castro, Steven Hand

Baggy bounds checking. Periklis Akri5dis, Manuel Costa, Miguel Castro, Steven Hand Baggy bounds checking Periklis Akri5dis, Manuel Costa, Miguel Castro, Steven Hand C/C++ programs are vulnerable Lots of exis5ng code in C and C++ More being wrieen every day C/C++ programs are prone to

More information

ADVANCED ELECTRONIC SOLUTIONS AVIATION SERVICES COMMUNICATIONS AND CONNECTIVITY MISSION SYSTEMS

ADVANCED ELECTRONIC SOLUTIONS AVIATION SERVICES COMMUNICATIONS AND CONNECTIVITY MISSION SYSTEMS The most important thing we build is trust ADVANCED ELECTRONIC SOLUTIONS AVIATION SERVICES COMMUNICATIONS AND CONNECTIVITY MISSION SYSTEMS UT840 LEON Quad Core First Silicon Results Cobham Semiconductor

More information

Automatic Selection of Compiler Options Using Non-parametric Inferential Statistics

Automatic Selection of Compiler Options Using Non-parametric Inferential Statistics Automatic Selection of Compiler Options Using Non-parametric Inferential Statistics Masayo Haneda Peter M.W. Knijnenburg Harry A.G. Wijshoff LIACS, Leiden University Motivation An optimal compiler optimization

More information

Microarchitecture Overview. Performance

Microarchitecture Overview. Performance Microarchitecture Overview Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 18, 2005 Performance 4 Make operations faster Process improvements Circuit improvements Use more transistors to make

More information

Microarchitecture Overview. Performance

Microarchitecture Overview. Performance Microarchitecture Overview Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 15, 2007 Performance 4 Make operations faster Process improvements Circuit improvements Use more transistors to make

More information

Compressing Heap Data for Improved Memory Performance

Compressing Heap Data for Improved Memory Performance Compressing Heap Data for Improved Memory Performance Youtao Zhang Rajiv Gupta Department of Computer Science Department of Computer Science The University of Texas at Dallas The University of Arizona

More information

Performance, Cost and Amdahl s s Law. Arquitectura de Computadoras

Performance, Cost and Amdahl s s Law. Arquitectura de Computadoras Performance, Cost and Amdahl s s Law Arquitectura de Computadoras Arturo Díaz D PérezP Centro de Investigación n y de Estudios Avanzados del IPN adiaz@cinvestav.mx Arquitectura de Computadoras Performance-

More information

An Analysis of the Amount of Global Level Redundant Computation in the SPEC 95 and SPEC 2000 Benchmarks

An Analysis of the Amount of Global Level Redundant Computation in the SPEC 95 and SPEC 2000 Benchmarks An Analysis of the Amount of Global Level Redundant Computation in the SPEC 95 and SPEC 2000 s Joshua J. Yi and David J. Lilja Department of Electrical and Computer Engineering Minnesota Supercomputing

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 41 Performance II CS61C L41 Performance II (1) Lecturer PSOE Dan Garcia www.cs.berkeley.edu/~ddgarcia UWB Ultra Wide Band! The FCC moved

More information

Design of Experiments - Terminology

Design of Experiments - Terminology Design of Experiments - Terminology Response variable Measured output value E.g. total execution time Factors Input variables that can be changed E.g. cache size, clock rate, bytes transmitted Levels Specific

More information

1.6 Computer Performance

1.6 Computer Performance 1.6 Computer Performance Performance How do we measure performance? Define Metrics Benchmarking Choose programs to evaluate performance Performance summary Fallacies and Pitfalls How to avoid getting fooled

More information

Checked C. Michael Hicks The University of Maryland joint work with David Tarditi (MSR), Andrew Ruef (UMD), Sam Elliott (UW)

Checked C. Michael Hicks The University of Maryland joint work with David Tarditi (MSR), Andrew Ruef (UMD), Sam Elliott (UW) Checked C Michael Hicks The University of Maryland joint work with David Tarditi (MSR), Andrew Ruef (UMD), Sam Elliott (UW) UM Motivation - Lots of C/C++ code out there. - One open source code indexer

More information

Breaking Cyclic-Multithreading Parallelization with XML Parsing. Simone Campanoni, Svilen Kanev, Kevin Brownell Gu-Yeon Wei, David Brooks

Breaking Cyclic-Multithreading Parallelization with XML Parsing. Simone Campanoni, Svilen Kanev, Kevin Brownell Gu-Yeon Wei, David Brooks Breaking Cyclic-Multithreading Parallelization with XML Parsing Simone Campanoni, Svilen Kanev, Kevin Brownell Gu-Yeon Wei, David Brooks 0 / 21 Scope Today s commodity platforms include multiple cores

More information

Cache Optimization by Fully-Replacement Policy

Cache Optimization by Fully-Replacement Policy American Journal of Embedded Systems and Applications 2016; 4(1): 7-14 http://www.sciencepublishinggroup.com/j/ajesa doi: 10.11648/j.ajesa.20160401.12 ISSN: 2376-6069 (Print); ISSN: 2376-6085 (Online)

More information

15-740/ Computer Architecture Lecture 10: Runahead and MLP. Prof. Onur Mutlu Carnegie Mellon University

15-740/ Computer Architecture Lecture 10: Runahead and MLP. Prof. Onur Mutlu Carnegie Mellon University 15-740/18-740 Computer Architecture Lecture 10: Runahead and MLP Prof. Onur Mutlu Carnegie Mellon University Last Time Issues in Out-of-order execution Buffer decoupling Register alias tables Physical

More information

COL862 Programming Assignment-1

COL862 Programming Assignment-1 Submitted By: Rajesh Kedia (214CSZ8383) COL862 Programming Assignment-1 Objective: Understand the power and energy behavior of various benchmarks on different types of x86 based systems. We explore a laptop,

More information

Power Measurements using performance counters

Power Measurements using performance counters Power Measurements using performance counters CSL862: Low-Power Computing By Suman A M (2015SIY7524) Android Power Consumption in Android Power Consumption in Smartphones are powered from batteries which

More information

TraceBack: First Fault Diagnosis by Reconstruction of Distributed Control Flow

TraceBack: First Fault Diagnosis by Reconstruction of Distributed Control Flow TraceBack: First Fault Diagnosis by Reconstruction of Distributed Control Flow Andrew Ayers Chris Metcalf Junghwan Rhee Richard Schooler VERITAS Emmett Witchel Microsoft Anant Agarwal UT Austin MIT Software

More information

CPU Performance Evaluation: Cycles Per Instruction (CPI) Most computers run synchronously utilizing a CPU clock running at a constant clock rate:

CPU Performance Evaluation: Cycles Per Instruction (CPI) Most computers run synchronously utilizing a CPU clock running at a constant clock rate: CPI CPU Performance Evaluation: Cycles Per Instruction (CPI) Most computers run synchronously utilizing a CPU clock running at a constant clock rate: Clock cycle where: Clock rate = 1 / clock cycle f =

More information

IBM Memory Expansion Technology

IBM Memory Expansion Technology By: Jeromie LaVoie Marist College Computer Architecture 507L256 Instructor: David Meck Note: Jeromie wrote this review paper as his homework IBM is not responsible for it s contents Table Of Contents Title:

More information

Chip-Multithreading Systems Need A New Operating Systems Scheduler

Chip-Multithreading Systems Need A New Operating Systems Scheduler Chip-Multithreading Systems Need A New Operating Systems Scheduler Alexandra Fedorova Christopher Small Daniel Nussbaum Margo Seltzer Harvard University, Sun Microsystems Sun Microsystems Sun Microsystems

More information

Computer Performance Evaluation: Cycles Per Instruction (CPI)

Computer Performance Evaluation: Cycles Per Instruction (CPI) Computer Performance Evaluation: Cycles Per Instruction (CPI) Most computers run synchronously utilizing a CPU clock running at a constant clock rate: where: Clock rate = 1 / clock cycle A computer machine

More information

Outline. Speculative Register Promotion Using Advanced Load Address Table (ALAT) Motivation. Motivation Example. Motivation

Outline. Speculative Register Promotion Using Advanced Load Address Table (ALAT) Motivation. Motivation Example. Motivation Speculative Register Promotion Using Advanced Load Address Table (ALAT Jin Lin, Tong Chen, Wei-Chung Hsu, Pen-Chung Yew http://www.cs.umn.edu/agassiz Motivation Outline Scheme of speculative register promotion

More information

MCD: A Multiple Clock Domain Microarchitecture

MCD: A Multiple Clock Domain Microarchitecture MCD: A Multiple Clock Domain Microarchitecture Dave Albonesi in collaboration with Greg Semeraro Grigoris Magklis Rajeev Balasubramonian Steve Dropsho Sandhya Dwarkadas Michael Scott Caveats We started

More information

Static Transformation for Heap Layout Using Memory Access Patterns

Static Transformation for Heap Layout Using Memory Access Patterns Static Transformation for Heap Layout Using Memory Access Patterns Jinseong Jeon Computer Science, KAIST Static Transformation computing machine compiler user + static transformation Compilers can transform

More information

EECS 583 Class 16 Research Topic 1 Automatic Parallelization

EECS 583 Class 16 Research Topic 1 Automatic Parallelization EECS 583 Class 16 Research Topic 1 Automatic Parallelization University of Michigan November 7, 2012 Announcements + Reading Material Midterm exam: Mon Nov 19 in class (Next next Monday)» I will post 2

More information

POSH: A TLS Compiler that Exploits Program Structure

POSH: A TLS Compiler that Exploits Program Structure POSH: A TLS Compiler that Exploits Program Structure Wei Liu, James Tuck, Luis Ceze, Wonsun Ahn, Karin Strauss, Jose Renau and Josep Torrellas Department of Computer Science University of Illinois at Urbana-Champaign

More information

Performance. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Performance. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Performance Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Defining Performance (1) Which airplane has the best performance? Boeing 777 Boeing

More information

Koji Inoue Department of Informatics, Kyushu University Japan Science and Technology Agency

Koji Inoue Department of Informatics, Kyushu University Japan Science and Technology Agency Lock and Unlock: A Data Management Algorithm for A Security-Aware Cache Department of Informatics, Japan Science and Technology Agency ICECS'06 1 Background (1/2) Trusted Program Malicious Program Branch

More information

Aries: Transparent Execution of PA-RISC/HP-UX Applications on IPF/HP-UX

Aries: Transparent Execution of PA-RISC/HP-UX Applications on IPF/HP-UX Aries: Transparent Execution of PA-RISC/HP-UX Applications on IPF/HP-UX Keerthi Bhushan Rajesh K Chaurasia Hewlett-Packard India Software Operations 29, Cunningham Road Bangalore 560 052 India +91-80-2251554

More information

The character of the instruction scheduling problem

The character of the instruction scheduling problem The character of the instruction scheduling problem Darko Stefanović Department of Computer Science University of Massachusetts March 997 Abstract Here I present some measurements that serve to characterize

More information

Dnmaloc: a more secure memory allocator

Dnmaloc: a more secure memory allocator Dnmaloc: a more secure memory allocator 28 September 2005 Yves Younan, Wouter Joosen, Frank Piessens and Hans Van den Eynden DistriNet, Department of Computer Science Katholieke Universiteit Leuven Belgium

More information

Computer Science 246. Computer Architecture

Computer Science 246. Computer Architecture Computer Architecture Spring 2010 Harvard University Instructor: Prof. dbrooks@eecs.harvard.edu Lecture Outline Performance Metrics Averaging Amdahl s Law Benchmarks The CPU Performance Equation Optimal

More information

Binary Stirring: Self-randomizing Instruction Addresses of Legacy x86 Binary Code

Binary Stirring: Self-randomizing Instruction Addresses of Legacy x86 Binary Code University of Crete Computer Science Department CS457 Introduction to Information Systems Security Binary Stirring: Self-randomizing Instruction Addresses of Legacy x86 Binary Code Papadaki Eleni 872 Rigakis

More information

Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding effects of underlying architecture

Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding effects of underlying architecture Chapter 2 Note: The slides being presented represent a mix. Some are created by Mark Franklin, Washington University in St. Louis, Dept. of CSE. Many are taken from the Patterson & Hennessy book, Computer

More information

Inserting Data Prefetches into Loops in Dynamically Translated Code in IA-32EL. Inserting Prefetches IA-32 Execution Layer - 1

Inserting Data Prefetches into Loops in Dynamically Translated Code in IA-32EL. Inserting Prefetches IA-32 Execution Layer - 1 I Inserting Data Prefetches into Loops in Dynamically Translated Code in IA-32EL Inserting Prefetches IA-32 Execution Layer - 1 Agenda IA-32EL Brief Overview Prefetching in Loops IA-32EL Prefetching in

More information

Decoupling Dynamic Information Flow Tracking with a Dedicated Coprocessor

Decoupling Dynamic Information Flow Tracking with a Dedicated Coprocessor Decoupling Dynamic Information Flow Tracking with a Dedicated Coprocessor Hari Kannan, Michael Dalton, Christos Kozyrakis Computer Systems Laboratory Stanford University Motivation Dynamic analysis help

More information

Effective Memory Protection Using Dynamic Tainting

Effective Memory Protection Using Dynamic Tainting Effective Memory Protection Using Dynamic Tainting James Clause Alessandro Orso (software) and Ioanis Doudalis Milos Prvulovic (hardware) College of Computing Georgia Institute of Technology Supported

More information

PARE: A Power-Aware Hardware Data Prefetching Engine

PARE: A Power-Aware Hardware Data Prefetching Engine PARE: A Power-Aware Hardware Data Prefetching Engine Yao Guo Mahmoud Ben Naser Csaba Andras Moritz Department of Electrical and Computer Engineering University of Massachusetts, Amherst, MA 01003 {yaoguo,

More information

Low-Complexity Reorder Buffer Architecture*

Low-Complexity Reorder Buffer Architecture* Low-Complexity Reorder Buffer Architecture* Gurhan Kucuk, Dmitry Ponomarev, Kanad Ghose Department of Computer Science State University of New York Binghamton, NY 13902-6000 http://www.cs.binghamton.edu/~lowpower

More information

Simple and Efficient Construction of Static Single Assignment Form

Simple and Efficient Construction of Static Single Assignment Form Simple and Efficient Construction of Static Single Assignment Form saarland university Matthias Braun, Sebastian Buchwald, Sebastian Hack, Roland Leißa, Christoph Mallon and Andreas Zwinkau computer science

More information

Shengyue Wang, Xiaoru Dai, Kiran S. Yellajyosula, Antonia Zhai, Pen-Chung Yew Department of Computer Science & Engineering University of Minnesota

Shengyue Wang, Xiaoru Dai, Kiran S. Yellajyosula, Antonia Zhai, Pen-Chung Yew Department of Computer Science & Engineering University of Minnesota Loop Selection for Thread-Level Speculation, Xiaoru Dai, Kiran S. Yellajyosula, Antonia Zhai, Pen-Chung Yew Department of Computer Science & Engineering University of Minnesota Chip Multiprocessors (CMPs)

More information

Skewed-Associative Caches: CS752 Final Project

Skewed-Associative Caches: CS752 Final Project Skewed-Associative Caches: CS752 Final Project Professor Sohi Corey Halpin Scot Kronenfeld Johannes Zeppenfeld 13 December 2002 Abstract As the gap between microprocessor performance and memory performance

More information

Improvements to Linear Scan register allocation

Improvements to Linear Scan register allocation Improvements to Linear Scan register allocation Alkis Evlogimenos (alkis) April 1, 2004 1 Abstract Linear scan register allocation is a fast global register allocation first presented in [PS99] as an alternative

More information

HardBound: Architectural Support for Spatial Safety of the C Programming Language

HardBound: Architectural Support for Spatial Safety of the C Programming Language HardBound: Architectural Support for Spatial Safety of the C Programming Language Joe Devietti *, Colin Blundell, Milo Martin, Steve Zdancewic * University of Washington, University of Pennsylvania devietti@cs.washington.edu,

More information

Address-Value Delta (AVD) Prediction: A Hardware Technique for Efficiently Parallelizing Dependent Cache Misses. Onur Mutlu Hyesoon Kim Yale N.

Address-Value Delta (AVD) Prediction: A Hardware Technique for Efficiently Parallelizing Dependent Cache Misses. Onur Mutlu Hyesoon Kim Yale N. Address-Value Delta (AVD) Prediction: A Hardware Technique for Efficiently Parallelizing Dependent Cache Misses Onur Mutlu Hyesoon Kim Yale N. Patt High Performance Systems Group Department of Electrical

More information

CS A Large, Fast Instruction Window for Tolerating. Cache Misses 1. Tong Li Jinson Koppanalil Alvin R. Lebeck. Department of Computer Science

CS A Large, Fast Instruction Window for Tolerating. Cache Misses 1. Tong Li Jinson Koppanalil Alvin R. Lebeck. Department of Computer Science CS 2002 03 A Large, Fast Instruction Window for Tolerating Cache Misses 1 Tong Li Jinson Koppanalil Alvin R. Lebeck Jaidev Patwardhan Eric Rotenberg Department of Computer Science Duke University Durham,

More information

Data Hiding in Compiled Program Binaries for Enhancing Computer System Performance

Data Hiding in Compiled Program Binaries for Enhancing Computer System Performance Data Hiding in Compiled Program Binaries for Enhancing Computer System Performance Ashwin Swaminathan 1, Yinian Mao 1,MinWu 1 and Krishnan Kailas 2 1 Department of ECE, University of Maryland, College

More information

Fast and Effective Orchestration of Compiler Optimizations for Automatic Performance Tuning

Fast and Effective Orchestration of Compiler Optimizations for Automatic Performance Tuning Fast and Effective Orchestration of Compiler Optimizations for Automatic Performance Tuning Zhelong Pan Rudolf Eigenmann Purdue University, School of ECE West Lafayette, IN, 4797 {zpan, eigenman}@purdue.edu

More information

Efficient Architecture Support for Thread-Level Speculation

Efficient Architecture Support for Thread-Level Speculation Efficient Architecture Support for Thread-Level Speculation A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY Venkatesan Packirisamy IN PARTIAL FULFILLMENT OF THE

More information

Register Packing Exploiting Narrow-Width Operands for Reducing Register File Pressure

Register Packing Exploiting Narrow-Width Operands for Reducing Register File Pressure Register Packing Exploiting Narrow-Width Operands for Reducing Register File Pressure Oguz Ergin*, Deniz Balkan, Kanad Ghose, Dmitry Ponomarev Department of Computer Science State University of New York

More information

Punctual Coalescing. Fernando Magno Quintão Pereira

Punctual Coalescing. Fernando Magno Quintão Pereira Punctual Coalescing Fernando Magno Quintão Pereira Register Coalescing Register coalescing is an op7miza7on on top of register alloca7on. The objec7ve is to map both variables used in a copy instruc7on

More information

Dynamic Points-To Sets: A Comparison with Static Analyses and Potential Applications in Program Understanding and Optimization

Dynamic Points-To Sets: A Comparison with Static Analyses and Potential Applications in Program Understanding and Optimization Dynamic Points-To Sets: A Comparison with Static Analyses and Potential Applications in Program Understanding and Optimization Markus Mock *, Manuvir Das +, Craig Chambers *, and Susan J. Eggers * * Department

More information

Software-assisted Cache Mechanisms for Embedded Systems. Prabhat Jain

Software-assisted Cache Mechanisms for Embedded Systems. Prabhat Jain Software-assisted Cache Mechanisms for Embedded Systems by Prabhat Jain Bachelor of Engineering in Computer Engineering Devi Ahilya University, 1986 Master of Technology in Computer and Information Technology

More information

A Cost Effective Spatial Redundancy with Data-Path Partitioning. Shigeharu Matsusaka and Koji Inoue Fukuoka University Kyushu University/PREST

A Cost Effective Spatial Redundancy with Data-Path Partitioning. Shigeharu Matsusaka and Koji Inoue Fukuoka University Kyushu University/PREST A Cost Effective Spatial Redundancy with Data-Path Partitioning Shigeharu Matsusaka and Koji Inoue Fukuoka University Kyushu University/PREST 1 Outline Introduction Data-path Partitioning for a dependable

More information

Integrating Superscalar Processor Components to Implement Register Caching

Integrating Superscalar Processor Components to Implement Register Caching Integrating Superscalar Processor Components to Implement Register Caching Matthew Postiff, David Greene, Steven Raasch, and Trevor Mudge Advanced Computer Architecture Laboratory, University of Michigan

More information

Staged Tuning: A Hybrid (Compile/Install-time) Technique for Improving Utilization of Performance-asymmetric Multicores

Staged Tuning: A Hybrid (Compile/Install-time) Technique for Improving Utilization of Performance-asymmetric Multicores Computer Science Technical Reports Computer Science Summer 6-29-2015 Staged Tuning: A Hybrid (Compile/Install-time) Technique for Improving Utilization of Performance-asymmetric Multicores Tyler Sondag

More information

Instruction Based Memory Distance Analysis and its Application to Optimization

Instruction Based Memory Distance Analysis and its Application to Optimization Instruction Based Memory Distance Analysis and its Application to Optimization Changpeng Fang cfang@mtu.edu Steve Carr carr@mtu.edu Soner Önder soner@mtu.edu Department of Computer Science Michigan Technological

More information

Inlining Java Native Calls at Runtime

Inlining Java Native Calls at Runtime Inlining Java Native Calls at Runtime (CASCON 2005 4 th Workshop on Compiler Driven Performance) Levon Stepanian, Angela Demke Brown Computer Systems Group Department of Computer Science, University of

More information

ATOS introduction ST/Linaro Collaboration Context

ATOS introduction ST/Linaro Collaboration Context ATOS introduction ST/Linaro Collaboration Context Presenter: Christian Bertin Development team: Rémi Duraffort, Christophe Guillon, François de Ferrière, Hervé Knochel, Antoine Moynault Consumer Product

More information

Exploiting Streams in Instruction and Data Address Trace Compression

Exploiting Streams in Instruction and Data Address Trace Compression Exploiting Streams in Instruction and Data Address Trace Compression Aleksandar Milenkovi, Milena Milenkovi Electrical and Computer Engineering Dept., The University of Alabama in Huntsville Email: {milenka

More information

Code Placement for Improving Dynamic Branch Prediction Accuracy

Code Placement for Improving Dynamic Branch Prediction Accuracy Code Placement for Improving Dynamic Branch Prediction Accuracy Daniel A. Jiménez Deptartment of Computer Science Departamento de Arquitectura de Computadores Rutgers University and Universidad Politécnica

More information

SimCore/Alpha Functional Simulator Version 1.0 : Simple and Readable Alpha Processor Simulator

SimCore/Alpha Functional Simulator Version 1.0 : Simple and Readable Alpha Processor Simulator SimCore/Alpha Functional Simulator Version 1.0 : Simple and Readable Alpha Processor Simulator Kenji KISE kis@is.uec.ac.jp Graduate School of Information Systems, University of Electro-Communications 2003-09-28

More information

COMPILER OPTIMIZATION ORCHESTRATION FOR PEAK PERFORMANCE

COMPILER OPTIMIZATION ORCHESTRATION FOR PEAK PERFORMANCE Purdue University Purdue e-pubs ECE Technical Reports Electrical and Computer Engineering 1-1-24 COMPILER OPTIMIZATION ORCHESTRATION FOR PEAK PERFORMANCE Zhelong Pan Rudolf Eigenmann Follow this and additional

More information

Workloads, Scalability and QoS Considerations in CMP Platforms

Workloads, Scalability and QoS Considerations in CMP Platforms Workloads, Scalability and QoS Considerations in CMP Platforms Presenter Don Newell Sr. Principal Engineer Intel Corporation 2007 Intel Corporation Agenda Trends and research context Evolving Workload

More information

Quantifying Performance EEC 170 Fall 2005 Chapter 4

Quantifying Performance EEC 170 Fall 2005 Chapter 4 Quantifying Performance EEC 70 Fall 2005 Chapter 4 Performance Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation

More information

Decoupled Zero-Compressed Memory

Decoupled Zero-Compressed Memory Decoupled Zero-Compressed Julien Dusser julien.dusser@inria.fr André Seznec andre.seznec@inria.fr Centre de recherche INRIA Rennes Bretagne Atlantique Campus de Beaulieu, 3542 Rennes Cedex, France Abstract

More information

The Impact of Instruction Compression on I-cache Performance

The Impact of Instruction Compression on I-cache Performance Technical Report CSE-TR--97, University of Michigan The Impact of Instruction Compression on I-cache Performance I-Cheng K. Chen Peter L. Bird Trevor Mudge EECS Department University of Michigan {icheng,pbird,tnm}@eecs.umich.edu

More information

Improving memory management security for C and C++

Improving memory management security for C and C++ Improving memory management security for C and C++ Yves Younan, Wouter Joosen, Frank Piessens, Hans Van den Eynden DistriNet, Katholieke Universiteit Leuven, Belgium Abstract Memory managers are an important

More information

Architecture Cloning For PowerPC Processors. Edwin Chan, Raul Silvera, Roch Archambault IBM Toronto Lab Oct 17 th, 2005

Architecture Cloning For PowerPC Processors. Edwin Chan, Raul Silvera, Roch Archambault IBM Toronto Lab Oct 17 th, 2005 Architecture Cloning For PowerPC Processors Edwin Chan, Raul Silvera, Roch Archambault edwinc@ca.ibm.com IBM Toronto Lab Oct 17 th, 2005 Outline Motivation Implementation Details Results Scenario Previously,

More information

HDFI: Hardware-Assisted Data-flow Isolation

HDFI: Hardware-Assisted Data-flow Isolation HDFI: Hardware-Assisted Data-flow Isolation Presented by Ben Schreiber Chengyu Song 1, Hyungon Moon 2, Monjur Alam 1, Insu Yun 1, Byoungyoung Lee 1, Taesoo Kim 1, Wenke Lee 1, Yunheung Paek 2 1 Georgia

More information

Quantifying Load Stream Behavior

Quantifying Load Stream Behavior In Proceedings of the 8th International Symposium on High Performance Computer Architecture (HPCA), February. Quantifying Load Stream Behavior Suleyman Sair Timothy Sherwood Brad Calder Department of Computer

More information

Many Cores, One Thread: Dean Tullsen University of California, San Diego

Many Cores, One Thread: Dean Tullsen University of California, San Diego Many Cores, One Thread: The Search for Nontraditional Parallelism University of California, San Diego There are some domains that feature nearly unlimited parallelism. Others, not so much Moore s Law and

More information

Characterization of Repeating Data Access Patterns in Integer Benchmarks

Characterization of Repeating Data Access Patterns in Integer Benchmarks Characterization of Repeating Data Access Patterns in Integer Benchmarks Erik M. Nystrom Roy Dz-ching Ju Wen-mei W. Hwu enystrom@uiuc.edu roy.ju@intel.com w-hwu@uiuc.edu Abstract Processor speeds continue

More information

The V-Way Cache : Demand-Based Associativity via Global Replacement

The V-Way Cache : Demand-Based Associativity via Global Replacement The V-Way Cache : Demand-Based Associativity via Global Replacement Moinuddin K. Qureshi David Thompson Yale N. Patt Department of Electrical and Computer Engineering The University of Texas at Austin

More information

Preliminary Evaluation of the Load Data Re-Computation Method for Delinquent Loads

Preliminary Evaluation of the Load Data Re-Computation Method for Delinquent Loads Preliminary Evaluation of the Load Data Re-Computation Method for Delinquent Loads Hideki Miwa, Yasuhiro Dougo, Victor M. Goulart Ferreira, Koji Inoue, and Kazuaki Murakami Dept. of Informatics, Kyushu

More information

2D-Profiling: Detecting Input-Dependent Branches with a Single Input Data Set

2D-Profiling: Detecting Input-Dependent Branches with a Single Input Data Set 2D-Profiling: Detecting Input-Dependent Branches with a Single Input Data Set Hyesoon Kim M. Aater Suleman Onur Mutlu Yale N. Patt Department of Electrical and Computer Engineering University of Texas

More information

MODELING EFFECTS OF SPECULATIVE INSTRUCTION EXECUTION IN A FUNCTIONAL CACHE SIMULATOR AMOL SHAMKANT PANDIT, B.E.

MODELING EFFECTS OF SPECULATIVE INSTRUCTION EXECUTION IN A FUNCTIONAL CACHE SIMULATOR AMOL SHAMKANT PANDIT, B.E. MODELING EFFECTS OF SPECULATIVE INSTRUCTION EXECUTION IN A FUNCTIONAL CACHE SIMULATOR BY AMOL SHAMKANT PANDIT, B.E. A thesis submitted to the Graduate School in partial fulfillment of the requirements

More information

A Global Progressive Register Allocator

A Global Progressive Register Allocator A Global Progressive Register Allocator David Ryan Koes Seth Copen Goldstein School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213-3891 {dkoes,seth}@cs.cmu.edu Abstract This paper

More information

From CISC to RISC. CISC Creates the Anti CISC Revolution. RISC "Philosophy" CISC Limitations

From CISC to RISC. CISC Creates the Anti CISC Revolution. RISC Philosophy CISC Limitations 1 CISC Creates the Anti CISC Revolution Digital Equipment Company (DEC) introduces VAX (1977) Commercially successful 32-bit CISC minicomputer From CISC to RISC In 1970s and 1980s CISC minicomputers became

More information

Speculative Multithreaded Processors

Speculative Multithreaded Processors Guri Sohi and Amir Roth Computer Sciences Department University of Wisconsin-Madison utline Trends and their implications Workloads for future processors Program parallelization and speculative threads

More information

Dual-Core Execution: Building A Highly Scalable Single-Thread Instruction Window

Dual-Core Execution: Building A Highly Scalable Single-Thread Instruction Window Dual-Core Execution: Building A Highly Scalable Single-Thread Instruction Window Huiyang Zhou School of Computer Science University of Central Florida New Challenges in Billion-Transistor Processor Era

More information

Chapter 1. Computer Abstractions and Technology. Adapted by Paulo Lopes, IST

Chapter 1. Computer Abstractions and Technology. Adapted by Paulo Lopes, IST Chapter 1 Computer Abstractions and Technology Adapted by Paulo Lopes, IST The Computer Revolution Progress in computer technology Sustained by Moore s Law Makes novel and old applications feasible Computers

More information

José F. Martínez 1, Jose Renau 2 Michael C. Huang 3, Milos Prvulovic 2, and Josep Torrellas 2

José F. Martínez 1, Jose Renau 2 Michael C. Huang 3, Milos Prvulovic 2, and Josep Torrellas 2 CHERRY: CHECKPOINTED EARLY RESOURCE RECYCLING José F. Martínez 1, Jose Renau 2 Michael C. Huang 3, Milos Prvulovic 2, and Josep Torrellas 2 1 2 3 MOTIVATION Problem: Limited processor resources Goal: More

More information

Execution-based Prediction Using Speculative Slices

Execution-based Prediction Using Speculative Slices Execution-based Prediction Using Speculative Slices Craig Zilles and Guri Sohi University of Wisconsin - Madison International Symposium on Computer Architecture July, 2001 The Problem Two major barriers

More information

Impact of Cache Coherence Protocols on the Processing of Network Traffic

Impact of Cache Coherence Protocols on the Processing of Network Traffic Impact of Cache Coherence Protocols on the Processing of Network Traffic Amit Kumar and Ram Huggahalli Communication Technology Lab Corporate Technology Group Intel Corporation 12/3/2007 Outline Background

More information

Performance Characterization of SPEC CPU Benchmarks on Intel's Core Microarchitecture based processor

Performance Characterization of SPEC CPU Benchmarks on Intel's Core Microarchitecture based processor Performance Characterization of SPEC CPU Benchmarks on Intel's Core Microarchitecture based processor Sarah Bird ϕ, Aashish Phansalkar ϕ, Lizy K. John ϕ, Alex Mericas α and Rajeev Indukuru α ϕ University

More information

Evaluation of Existing Architectures in IRAM Systems

Evaluation of Existing Architectures in IRAM Systems Evaluation of Existing Architectures in IRAM Systems Ngeci Bowman, Neal Cardwell, Christoforos E. Kozyrakis, Cynthia Romer and Helen Wang Computer Science Division University of California Berkeley fbowman,neal,kozyraki,cromer,helenjwg@cs.berkeley.edu

More information

Introduction to Microprocessor

Introduction to Microprocessor Introduction to Microprocessor Slide 1 Microprocessor A microprocessor is a multipurpose, programmable, clock-driven, register-based electronic device That reads binary instructions from a storage device

More information

Evaluating the Performance Impact of Hardware Thread Priorities in Simultaneous Multithreaded Processors using SPEC CPU2000

Evaluating the Performance Impact of Hardware Thread Priorities in Simultaneous Multithreaded Processors using SPEC CPU2000 Evaluating the Performance Impact of Hardware Thread Priorities in Simultaneous Multithreaded Processors using SPEC CPU2000 Mitesh R. Meswani and Patricia J. Teller Department of Computer Science, University

More information

The Smart Cache: An Energy-Efficient Cache Architecture Through Dynamic Adaptation

The Smart Cache: An Energy-Efficient Cache Architecture Through Dynamic Adaptation Noname manuscript No. (will be inserted by the editor) The Smart Cache: An Energy-Efficient Cache Architecture Through Dynamic Adaptation Karthik T. Sundararajan Timothy M. Jones Nigel P. Topham Received:

More information

Performance Prediction using Program Similarity

Performance Prediction using Program Similarity Performance Prediction using Program Similarity Aashish Phansalkar Lizy K. John {aashish, ljohn}@ece.utexas.edu University of Texas at Austin Abstract - Modern computer applications are developed at a

More information

Mapping of Applications to Heterogeneous Multi-cores Based on Micro-architecture Independent Characteristics

Mapping of Applications to Heterogeneous Multi-cores Based on Micro-architecture Independent Characteristics Mapping of Applications to Heterogeneous Multi-cores Based on Micro-architecture Independent Characteristics Jian Chen, Nidhi Nayyar and Lizy K. John Department of Electrical and Computer Engineering The

More information

Transparent Pointer Compression for Linked Data Structures

Transparent Pointer Compression for Linked Data Structures Transparent Pointer Compression for Linked Data Structures lattner@cs.uiuc.edu Vikram Adve vadve@cs.uiuc.edu June 12, 2005 MSP 2005 http://llvm.cs.uiuc.edu llvm.cs.uiuc.edu/ Growth of 64-bit computing

More information

Relative Performance of a Multi-level Cache with Last-Level Cache Replacement: An Analytic Review

Relative Performance of a Multi-level Cache with Last-Level Cache Replacement: An Analytic Review Relative Performance of a Multi-level Cache with Last-Level Cache Replacement: An Analytic Review Bijay K.Paikaray Debabala Swain Dept. of CSE, CUTM Dept. of CSE, CUTM Bhubaneswer, India Bhubaneswer, India

More information

Wish Branch: A New Control Flow Instruction Combining Conditional Branching and Predicated Execution

Wish Branch: A New Control Flow Instruction Combining Conditional Branching and Predicated Execution Wish Branch: A New Control Flow Instruction Combining Conditional Branching and Predicated Execution Hyesoon Kim Onur Mutlu Jared Stark David N. Armstrong Yale N. Patt High Performance Systems Group Department

More information