A Dynamic Program Analysis to find Floating-Point Accuracy Problems

Size: px
Start display at page:

Download "A Dynamic Program Analysis to find Floating-Point Accuracy Problems"

Transcription

1 1 A Dynamic Program Analysis to find Floating-Point Accuracy Problems Florian Benz fbenz@stud.uni-saarland.de Andreas Hildebrandt andreas.hildebrandt@uni-mainz.de Sebastian Hack hack@cs.uni-saarland.de PLDI 2012, Beijing June 13, 2012

2 2 Introduction Floating-point arithmetic is ubiquitous Almost every language has a floating-point data type Most PCs and supercomputers have floating-point accelerators Not well understood by most developers

3 2 Introduction Floating-point arithmetic is ubiquitous Almost every language has a floating-point data type Most PCs and supercomputers have floating-point accelerators Not well understood by most developers Our Contribution A dynamic program analysis that assists developers in understanding and tracking down floating-point arithmetic issues in real-world programs.

4 3 Insufficient Precision Floating-Point Arithmetic Problems Finite Precision Cancellation Finding the Cause

5 4 Insufficient Precision Floating-Point Arithmetic Problems Finite Precision Cancellation Finding the Cause

6 5 Problem: Insufficient Precision float e = f; float sum = 1.0f; int i; for (i = 0; i < 5; i++) { sum += e; } Finally sum = 1.0

7 5 Problem: Insufficient Precision float e = f; float sum = 1.0f; int i; for (i = 0; i < 5; i++) { sum += e; } Finally sum = 1.0 Higher precision yields sum = Single precision machine epsilon: f

8 6 Solution: Side by Side in Higher Precision Original float e = f; float sum = 1.0f; int i; for (i = 0; i < 5; i++) { sum += e; }

9 6 Solution: Side by Side in Higher Precision Original float e = f; float sum = 1.0f; int i; for (i = 0; i < 5; i++) { sum += e; } Higher precision sum += e; Side-by-side computation in higher precision

10 6 Solution: Side by Side in Higher Precision Original float e = f; float sum = 1.0f; int i; for (i = 0; i < 5; i++) { sum += e; } Higher precision e = ; sum = 1.0; sum += e; Side-by-side computation in higher precision Shadowing every floating-point value

11 7 Solution: Side by Side in Higher Precision Original float e = f; float sum = 1.0f; int i; for (i = 0; i < 5; i++) { sum += e; } Higher precision e = ; sum = 1.0; sum += e; Iteration Single precision Higher precision

12 8 Error Measurement relative error = exact value approximate value exact value

13 8 Error Measurement relative error = exact value approximate value exact value Approximate exact value with higher precision value

14 8 Error Measurement relative error = exact value approximate value exact value Approximate exact value with higher precision value =

15 8 Error Measurement relative error = exact value approximate value exact value Approximate exact value with higher precision value = Relative errors smaller than machine epsilon are unavoidable float double

16 9 Insufficient Precision Floating-Point Arithmetic Problems Finite Precision Cancellation Finding the Cause

17 10 Problem: Cancellation Benign

18 10 Problem: Cancellation Benign exact inexact

19 10 Problem: Cancellation Benign exact inexact relative error

20 10 Problem: Cancellation Benign exact inexact relative error Catastrophic exact inexact

21 10 Problem: Cancellation Benign exact inexact relative error Catastrophic exact inexact relative error

22 11 Solution: Cancellation Badness Benign exact inexact relative error (canceled) 7 (exact) + 1 = 0 0

23 11 Solution: Cancellation Badness Benign Catastrophic exact inexact relative error (canceled) 7 (exact) + 1 = 0 exact inexact relative error (canceled) 6 (exact) + 1 = 1

24 12 Insufficient Precision Floating-Point Arithmetic Problems Finite Precision Cancellation Finding the Cause

25 13 Problem: Finding the Cause 1 float e = f; 2 float x = 0.5f; 3 float y = 1.0f + x; 4 float more = y + e; 5 float diff e = more - y; 6 float diff 0 = diff e - e; 7 float zero = diff 0 + diff 0; 8 float result = 2 * zero; result = Higher precision yields result = 0

26 13 Solution: Light-Weight Slicing 1 float e = f; 2 float x = 0.5f; 3 float y = 1.0f + x; 4 float more = y + e; 5 float diff e = more - y; 6 float diff 0 = diff e - e; 7 float zero = diff 0 + diff 0; 8 float result = 2 * zero; Add (4) Sub (5) Sub (6) Add (7) 0 0 Add (8) result = Higher precision yields result = 0

27 14 Insufficient Precision Floating-Point Arithmetic Problems Finite Precision Cancellation Finding the Cause

28 Problem: Finite Precision 1 2 if n = 0, u n = 4 if n = 1, u n u n 1 u n 2 if n > 1. Mathematically correct lim u n = 6 n 1 Muller et al.: Handbook of Floating-Point Arithmetic, Birkhäuser,

29 Problem: Finite Precision 1 2 if n = 0, u n = 4 if n = 1, u n u n 1 u n 2 if n > 1. Mathematically correct For all finite precisions lim u n = 6 n lim u n = 100 n 1 Muller et al.: Handbook of Floating-Point Arithmetic, Birkhäuser,

30 16 Analysis of the Problem 100 u Double precision (53 bit) Higher precision (here: 128 bit) Correct n Fully automatic analysis detects no error

31 16 Analysis of the Problem 100 u Double precision (53 bit) Higher precision (here: 128 bit) Correct n Fully automatic analysis detects no error Can only be discovered with intermediate results

32 17 Solution: Stages int i; double u, v, w; u = 2; v = -4; for (i = 3; i <= 50; i++) { w = 111. u = v; v = w; /v /(v*u); }

33 17 Solution: Stages int i; double u, v, w; u = 2; v = -4; for (i = 3; i <= 50; i++) { FPDEBUG BEGIN STAGE(0); w = 111. u = v; v = w; /v /(v*u); } FPDEBUG END STAGE(0);

34 18 Case Study: Biochemical Algorithms Library (BALL) > 400, 000 lines of code

35 18 Case Study: Biochemical Algorithms Library (BALL) > 400, 000 lines of code double StretchComponent::updateEnergy() {... double distance = atom1->getdistance(*atom2); energy += stretch [i].values.k * (distance - stretch [i].values.r0) * (distance - stretch [i].values.r0);... }

36 18 Case Study: Biochemical Algorithms Library (BALL) > 400, 000 lines of code double StretchComponent::updateEnergy() {... double distance = atom1->getdistance(*atom2); energy += stretch [i].values.k * (distance - stretch [i].values.r0) * (distance - stretch [i].values.r0);... } Catastrophic cancellation At most 24 bits canceled (double: 53 bit precision)

37 18 Case Study: Biochemical Algorithms Library (BALL) > 400, 000 lines of code double StretchComponent::updateEnergy() {... double distance = atom1->getdistance(*atom2); energy += stretch [i].values.k * (distance - stretch [i].values.r0) * (distance - stretch [i].values.r0);... } Catastrophic cancellation At most 24 bits canceled (double: 53 bit precision) float Atom::getDistance(const Atom& a) const

38 Case Study: GNU Linear Programming Kit (GLPK) 2 > 100, 000 lines of code min x 20 s.t. (s + 1) x 1 x 2 s 1, sx i 1 + (s + 1) x i x i+1 ( 1) i (s + 1) for i = 2 : 19, sx 18 (3s 1) x x 20 (5s 7), 0 x i 10 for i = 1 : 13, 0 x i B for i = 14 : 20, all x i integers, 2 Neumaier and Shcherbina: Safe bounds in linear and mixed-integer linear programming, Mathematical Programming,

39 Case Study: GNU Linear Programming Kit (GLPK) 2 > 100, 000 lines of code min x 20 s.t. (s + 1) x 1 x 2 s 1, sx i 1 + (s + 1) x i x i+1 ( 1) i (s + 1) for i = 2 : 19, sx 18 (3s 1) x x 20 (5s 7), 0 x i 10 for i = 1 : 13, 0 x i B for i = 14 : 20, all x i integers, Unique solution if B 2 x = (1, 2, 1, 2,..., 1, 2) T 2 Neumaier and Shcherbina: Safe bounds in linear and mixed-integer linear programming, Mathematical Programming,

40 Case Study: GNU Linear Programming Kit (GLPK) Binary search B = x = (1, 2, 1, 2,..., 1, 2) T B = Problem has no integer feasible solution 20

41 Case Study: GNU Linear Programming Kit (GLPK) Binary search B = x = (1, 2, 1, 2,..., 1, 2) T B = Problem has no integer feasible solution Compared the runs Found variable that differs Bound Shadow Original

42 Case Study: Calculix (SPEC CFP2006) double DVdot (int size, double y[], double x[]) { double sum = 0.0; int i; for (i = 0; i < size; i++) { sum += y[i] * x[i]; } } return sum; 21

43 Case Study: Calculix (SPEC CFP2006) double DVdot (int size, double y[], double x[]) { FPDEBUG BEGIN(); double sum = 0.0; int i; for (i = 0; i < size; i++) { sum += y[i] * x[i]; } } FPDEBUG END(); return sum; 21

44 Case Study: Calculix (SPEC CFP2006) double DVdot (int size, double y[], double x[]) { FPDEBUG BEGIN(); double sum = 0.0; int i; for (i = 0; i < size; i++) { sum += y[i] * x[i]; } } FPDEBUG INSERT SHADOW(&sum); FPDEBUG END(); return sum; 21

45 Case Study: Calculix (SPEC CFP2006) double DVdot (int size, double y[], double x[]) { FPDEBUG BEGIN(); } double sum = 0.0; int i; for (i = 0; i < size; i++) { sum += y[i] * x[i]; } double errorbound = 1e-2; if (FPDEBUG ERROR GREATER(&sum, &errorbound)) { /* Print arrays x, and y */ } FPDEBUG INSERT SHADOW(&sum); FPDEBUG END(); return sum; 21

46 22 Performance: SPEC CFP2006 Benchmark Original Analyzed Slowdown bwaves s s 167 x gamess 0.70 s s 544 x milc s s 224 x gromacs 2.10 s s 472 x cactusadm 4.70 s s 1016 x leslie3d s s 292 x namd s s 957 x soplex 0.03 s 5.00 s 185 x povray 0.90 s s 444 x calculix 0.07 s s 244 x GemsFDTD 5.50 s s 208 x tonto 1.26 s s 321 x lbm 9.55 s s 303 x wrf 7.68 s s 342 x sphinx s s 213 x

47 23 Conclusion Dynamic program analysis Detects floating-point accuracy problems Detects catastrophic cancellations Works on large-scale programs Finds real-world problems Is open source: github.com/fbenz/fpdebug

48 23 Conclusion Dynamic program analysis Detects floating-point accuracy problems Detects catastrophic cancellations Works on large-scale programs Finds real-world problems Is open source: github.com/fbenz/fpdebug Thank You! Questions?

A Dynamic Program Analysis to find Floating-Point Accuracy Problems

A Dynamic Program Analysis to find Floating-Point Accuracy Problems A Dynamic Program Analysis to find Floating-Point Accuracy Problems Florian Benz Saarland University fbenz@stud.uni-saarland.de Andreas Hildebrandt Johannes-Gutenberg Universität Mainz andreas.hildebrandt@uni-mainz.de

More information

A Fast Instruction Set Simulator for RISC-V

A Fast Instruction Set Simulator for RISC-V A Fast Instruction Set Simulator for RISC-V Maxim.Maslov@esperantotech.com Vadim.Gimpelson@esperantotech.com Nikita.Voronov@esperantotech.com Dave.Ditzel@esperantotech.com Esperanto Technologies, Inc.

More information

Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors

Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors Resource-Conscious Scheduling for Energy Efficiency on Andreas Merkel, Jan Stoess, Frank Bellosa System Architecture Group KIT The cooperation of Forschungszentrum Karlsruhe GmbH and Universität Karlsruhe

More information

Lightweight Memory Tracing

Lightweight Memory Tracing Lightweight Memory Tracing Mathias Payer*, Enrico Kravina, Thomas Gross Department of Computer Science ETH Zürich, Switzerland * now at UC Berkeley Memory Tracing via Memlets Execute code (memlets) for

More information

Footprint-based Locality Analysis

Footprint-based Locality Analysis Footprint-based Locality Analysis Xiaoya Xiang, Bin Bao, Chen Ding University of Rochester 2011-11-10 Memory Performance On modern computer system, memory performance depends on the active data usage.

More information

Improving Cache Performance by Exploi7ng Read- Write Disparity. Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A.

Improving Cache Performance by Exploi7ng Read- Write Disparity. Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A. Improving Cache Performance by Exploi7ng Read- Write Disparity Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A. Jiménez Summary Read misses are more cri?cal than write misses

More information

UCB CS61C : Machine Structures

UCB CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 36 Performance 2010-04-23 Lecturer SOE Dan Garcia How fast is your computer? Every 6 months (Nov/June), the fastest supercomputers in

More information

NightWatch: Integrating Transparent Cache Pollution Control into Dynamic Memory Allocation Systems

NightWatch: Integrating Transparent Cache Pollution Control into Dynamic Memory Allocation Systems NightWatch: Integrating Transparent Cache Pollution Control into Dynamic Memory Allocation Systems Rentong Guo 1, Xiaofei Liao 1, Hai Jin 1, Jianhui Yue 2, Guang Tan 3 1 Huazhong University of Science

More information

ISA-Aging. (SHRINK: Reducing the ISA Complexity Via Instruction Recycling) Accepted for ISCA 2015

ISA-Aging. (SHRINK: Reducing the ISA Complexity Via Instruction Recycling) Accepted for ISCA 2015 ISA-Aging (SHRINK: Reducing the ISA Complexity Via Instruction Recycling) Accepted for ISCA 2015 Bruno Cardoso Lopes, Rafael Auler, Edson Borin, Luiz Ramos, Rodolfo Azevedo, University of Campinas, Brasil

More information

Energy Models for DVFS Processors

Energy Models for DVFS Processors Energy Models for DVFS Processors Thomas Rauber 1 Gudula Rünger 2 Michael Schwind 2 Haibin Xu 2 Simon Melzner 1 1) Universität Bayreuth 2) TU Chemnitz 9th Scheduling for Large Scale Systems Workshop July

More information

Performance. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Performance. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Performance Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Defining Performance (1) Which airplane has the best performance? Boeing 777 Boeing

More information

Architecture of Parallel Computer Systems - Performance Benchmarking -

Architecture of Parallel Computer Systems - Performance Benchmarking - Architecture of Parallel Computer Systems - Performance Benchmarking - SoSe 18 L.079.05810 www.uni-paderborn.de/pc2 J. Simon - Architecture of Parallel Computer Systems SoSe 2018 < 1 > Definition of Benchmark

More information

Improving Cache Performance by Exploi7ng Read- Write Disparity. Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A.

Improving Cache Performance by Exploi7ng Read- Write Disparity. Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A. Improving Cache Performance by Exploi7ng Read- Write Disparity Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A. Jiménez Summary Read misses are more cri?cal than write misses

More information

Introducing the GCC to the Polyhedron Model

Introducing the GCC to the Polyhedron Model 1/15 Michael Claßen University of Passau St. Goar, June 30th 2009 2/15 Agenda Agenda 1 GRAPHITE Introduction Status of GRAPHITE 2 The Polytope Model in GRAPHITE What code can be represented? GPOLY - The

More information

UCB CS61C : Machine Structures

UCB CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 38 Performance 2008-04-30 Lecturer SOE Dan Garcia How fast is your computer? Every 6 months (Nov/June), the fastest supercomputers in

More information

Energy Proportional Datacenter Memory. Brian Neel EE6633 Fall 2012

Energy Proportional Datacenter Memory. Brian Neel EE6633 Fall 2012 Energy Proportional Datacenter Memory Brian Neel EE6633 Fall 2012 Outline Background Motivation Related work DRAM properties Designs References Background The Datacenter as a Computer Luiz André Barroso

More information

Computer Architecture. Introduction

Computer Architecture. Introduction to Computer Architecture 1 Computer Architecture What is Computer Architecture From Wikipedia, the free encyclopedia In computer engineering, computer architecture is a set of rules and methods that describe

More information

Detection of Weak Spots in Benchmarks Memory Space by using PCA and CA

Detection of Weak Spots in Benchmarks Memory Space by using PCA and CA Leonardo Electronic Journal of Practices and Technologies ISSN 1583-1078 Issue 16, January-June 2010 p. 43-52 Detection of Weak Spots in Memory Space by using PCA and CA Abdul Kareem PARCHUR *, Fazal NOORBASHA

More information

Performance Characterization of SPEC CPU Benchmarks on Intel's Core Microarchitecture based processor

Performance Characterization of SPEC CPU Benchmarks on Intel's Core Microarchitecture based processor Performance Characterization of SPEC CPU Benchmarks on Intel's Core Microarchitecture based processor Sarah Bird ϕ, Aashish Phansalkar ϕ, Lizy K. John ϕ, Alex Mericas α and Rajeev Indukuru α ϕ University

More information

Data Prefetching by Exploiting Global and Local Access Patterns

Data Prefetching by Exploiting Global and Local Access Patterns Journal of Instruction-Level Parallelism 13 (2011) 1-17 Submitted 3/10; published 1/11 Data Prefetching by Exploiting Global and Local Access Patterns Ahmad Sharif Hsien-Hsin S. Lee School of Electrical

More information

PIPELINING AND PROCESSOR PERFORMANCE

PIPELINING AND PROCESSOR PERFORMANCE PIPELINING AND PROCESSOR PERFORMANCE Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 1, John L. Hennessy and David A. Patterson, Morgan Kaufmann,

More information

Sandbox Based Optimal Offset Estimation [DPC2]

Sandbox Based Optimal Offset Estimation [DPC2] Sandbox Based Optimal Offset Estimation [DPC2] Nathan T. Brown and Resit Sendag Department of Electrical, Computer, and Biomedical Engineering Outline Motivation Background/Related Work Sequential Offset

More information

Perceptron Learning for Reuse Prediction

Perceptron Learning for Reuse Prediction Perceptron Learning for Reuse Prediction Elvira Teran Zhe Wang Daniel A. Jiménez Texas A&M University Intel Labs {eteran,djimenez}@tamu.edu zhe2.wang@intel.com Abstract The disparity between last-level

More information

Open Access Research on the Establishment of MSR Model in Cloud Computing based on Standard Performance Evaluation

Open Access Research on the Establishment of MSR Model in Cloud Computing based on Standard Performance Evaluation Send Orders for Reprints to reprints@benthamscience.ae The Open Automation and Control Systems Journal, 2015, 7, 821-825 821 Open Access Research on the Establishment of MSR Model in Cloud Computing based

More information

Addressing End-to-End Memory Access Latency in NoC-Based Multicores

Addressing End-to-End Memory Access Latency in NoC-Based Multicores Addressing End-to-End Memory Access Latency in NoC-Based Multicores Akbar Sharifi, Emre Kultursay, Mahmut Kandemir and Chita R. Das The Pennsylvania State University University Park, PA, 682, USA {akbar,euk39,kandemir,das}@cse.psu.edu

More information

Efficient Memory Shadowing for 64-bit Architectures

Efficient Memory Shadowing for 64-bit Architectures Efficient Memory Shadowing for 64-bit Architectures The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation Qin Zhao, Derek Bruening,

More information

CPU Performance Evaluation: Cycles Per Instruction (CPI) Most computers run synchronously utilizing a CPU clock running at a constant clock rate:

CPU Performance Evaluation: Cycles Per Instruction (CPI) Most computers run synchronously utilizing a CPU clock running at a constant clock rate: CPI CPU Performance Evaluation: Cycles Per Instruction (CPI) Most computers run synchronously utilizing a CPU clock running at a constant clock rate: Clock cycle where: Clock rate = 1 / clock cycle f =

More information

Loop-Oriented Array- and Field-Sensitive Pointer Analysis for Automatic SIMD Vectorization

Loop-Oriented Array- and Field-Sensitive Pointer Analysis for Automatic SIMD Vectorization Loop-Oriented Array- and Field-Sensitive Pointer Analysis for Automatic SIMD Vectorization Yulei Sui, Xiaokang Fan, Hao Zhou and Jingling Xue School of Computer Science and Engineering The University of

More information

Near-Threshold Computing: How Close Should We Get?

Near-Threshold Computing: How Close Should We Get? Near-Threshold Computing: How Close Should We Get? Alaa R. Alameldeen Intel Labs Workshop on Near-Threshold Computing June 14, 2014 Overview High-level talk summarizing my architectural perspective on

More information

REEL: Reducing Effective Execution Latency of. Floating Point Operations

REEL: Reducing Effective Execution Latency of. Floating Point Operations REEL: Reducing Effective Execution Latency of Floating Point Operations Vignyan Reddy, Syed Zohaib Gilani, Erika Gunadi, Nam Sung Kim, Michael J Schulte and Mikko H Lipasti University of Wisconsin-Madison

More information

HOTL: a Higher Order Theory of Locality

HOTL: a Higher Order Theory of Locality HOTL: a Higher Order Theory of Locality Xiaoya Xiang Chen Ding Hao Luo Department of Computer Science University of Rochester {xiang, cding, hluo}@cs.rochester.edu Bin Bao Adobe Systems Incorporated bbao@adobe.com

More information

Insertion and Promotion for Tree-Based PseudoLRU Last-Level Caches

Insertion and Promotion for Tree-Based PseudoLRU Last-Level Caches Insertion and Promotion for Tree-Based PseudoLRU Last-Level Caches Daniel A. Jiménez Department of Computer Science and Engineering Texas A&M University ABSTRACT Last-level caches mitigate the high latency

More information

CS377P Programming for Performance Single Thread Performance Out-of-order Superscalar Pipelines

CS377P Programming for Performance Single Thread Performance Out-of-order Superscalar Pipelines CS377P Programming for Performance Single Thread Performance Out-of-order Superscalar Pipelines Sreepathi Pai UTCS September 14, 2015 Outline 1 Introduction 2 Out-of-order Scheduling 3 The Intel Haswell

More information

Multiperspective Reuse Prediction

Multiperspective Reuse Prediction ABSTRACT Daniel A. Jiménez Texas A&M University djimenezacm.org The disparity between last-level cache and memory latencies motivates the search for e cient cache management policies. Recent work in predicting

More information

An SSA-based Algorithm for Optimal Speculative Code Motion under an Execution Profile

An SSA-based Algorithm for Optimal Speculative Code Motion under an Execution Profile An SSA-based Algorithm for Optimal Speculative Code Motion under an Execution Profile Hucheng Zhou Tsinghua University zhou-hc07@mails.tsinghua.edu.cn Wenguang Chen Tsinghua University cwg@tsinghua.edu.cn

More information

Last time. Lecture #29 Performance & Parallel Intro

Last time. Lecture #29 Performance & Parallel Intro CS61C L29 Performance & Parallel (1) inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #29 Performance & Parallel Intro 2007-8-14 Scott Beamer, Instructor Paper Battery Developed by Researchers

More information

Information System Architecture Natawut Nupairoj Ph.D. Department of Computer Engineering, Chulalongkorn University

Information System Architecture Natawut Nupairoj Ph.D. Department of Computer Engineering, Chulalongkorn University 2110684 Information System Architecture Natawut Nupairoj Ph.D. Department of Computer Engineering, Chulalongkorn University Agenda Capacity Planning Determining the production capacity needed by an organization

More information

The information provided is intended to help designers and end users make performance

The information provided is intended to help designers and end users make performance Configuring and Tuning for Performance on Intel 5100 Memory Controller Hub Chipset Based Platforms Contributor Perry Taylor Intel Corporation Index Words Intel 5100 Memory Controller Hub chipset Intel

More information

Lightweight Memory Tracing

Lightweight Memory Tracing Lightweight Memory Tracing Mathias Payer ETH Zurich Enrico Kravina ETH Zurich Thomas R. Gross ETH Zurich Abstract Memory tracing (executing additional code for every memory access of a program) is a powerful

More information

Bias Scheduling in Heterogeneous Multi-core Architectures

Bias Scheduling in Heterogeneous Multi-core Architectures Bias Scheduling in Heterogeneous Multi-core Architectures David Koufaty Dheeraj Reddy Scott Hahn Intel Labs {david.a.koufaty, dheeraj.reddy, scott.hahn}@intel.com Abstract Heterogeneous architectures that

More information

A Front-end Execution Architecture for High Energy Efficiency

A Front-end Execution Architecture for High Energy Efficiency A Front-end Execution Architecture for High Energy Efficiency Ryota Shioya, Masahiro Goshima and Hideki Ando Department of Electrical Engineering and Computer Science, Nagoya University, Aichi, Japan Information

More information

HOTL: A Higher Order Theory of Locality

HOTL: A Higher Order Theory of Locality HOTL: A Higher Order Theory of Locality Xiaoya Xiang Chen Ding Hao Luo Department of Computer Science University of Rochester {xiang, cding, hluo}@cs.rochester.edu Bin Bao Adobe Systems Incorporated bbao@adobe.com

More information

Thesis Defense Lavanya Subramanian

Thesis Defense Lavanya Subramanian Providing High and Predictable Performance in Multicore Systems Through Shared Resource Management Thesis Defense Lavanya Subramanian Committee: Advisor: Onur Mutlu Greg Ganger James Hoe Ravi Iyer (Intel)

More information

Input-Sensitive Profiling

Input-Sensitive Profiling Input-Sensitive Profiling Emilio Coppa Dept. of Computer and System Sciences Sapienza University of Rome ercoppa@gmail.com Camil Demetrescu Dept. of Computer and System Sciences Sapienza University of

More information

IBM POWER Systems Compiler Roadmap

IBM POWER Systems Compiler Roadmap IBM POWER Systems Compiler Roadmap Roch Archambault IBM Toronto Laboratory archie@ca.ibm.com SCICOMP-14 May 22, 2008 Agenda Overall Roadmap The POWER Systems Compiler Products Detailed Roadmaps Common

More information

Translation Caching: Skip, Don t Walk (the Page Table)

Translation Caching: Skip, Don t Walk (the Page Table) Translation Caching: Skip, Don t Walk (the Page Table) Thomas W. Barr, Alan L. Cox, Scott Rixner Rice University Houston, TX {twb, alc, rixner}@rice.edu ABSTRACT This paper explores the design space of

More information

Making Data Prefetch Smarter: Adaptive Prefetching on POWER7

Making Data Prefetch Smarter: Adaptive Prefetching on POWER7 Making Data Prefetch Smarter: Adaptive Prefetching on POWER7 Víctor Jiménez Barcelona Supercomputing Center Barcelona, Spain victor.javier@bsc.es Alper Buyuktosunoglu IBM T. J. Watson Research Center Yorktown

More information

A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach

A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. Mishra Onur Mutlu Chita R. Das Executive summary Problem: Current day NoC designs are agnostic to application requirements

More information

EKT 303 WEEK Pearson Education, Inc., Hoboken, NJ. All rights reserved.

EKT 303 WEEK Pearson Education, Inc., Hoboken, NJ. All rights reserved. + EKT 303 WEEK 2 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. Chapter 2 + Performance Issues + Designing for Performance The cost of computer systems continues to drop dramatically,

More information

SEN361 Computer Organization. Prof. Dr. Hasan Hüseyin BALIK (2 nd Week)

SEN361 Computer Organization. Prof. Dr. Hasan Hüseyin BALIK (2 nd Week) + SEN361 Computer Organization Prof. Dr. Hasan Hüseyin BALIK (2 nd Week) + Outline 1. Overview 1.1 Basic Concepts and Computer Evolution 1.2 Performance Issues + 1.2 Performance Issues + Designing for

More information

Multi-Cache Resizing via Greedy Coordinate Descent

Multi-Cache Resizing via Greedy Coordinate Descent Noname manuscript No. (will be inserted by the editor) Multi-Cache Resizing via Greedy Coordinate Descent I. Stephen Choi Donald Yeung Received: date / Accepted: date Abstract To reduce power consumption

More information

Computational Economics and Finance

Computational Economics and Finance Computational Economics and Finance Part I: Elementary Concepts of Numerical Analysis Spring 2016 Outline Computer arithmetic Error analysis: Sources of error Error propagation Controlling the error Rates

More information

Predicting Performance Impact of DVFS for Realistic Memory Systems

Predicting Performance Impact of DVFS for Realistic Memory Systems Predicting Performance Impact of DVFS for Realistic Memory Systems Rustam Miftakhutdinov Eiman Ebrahimi Yale N. Patt The University of Texas at Austin Nvidia Corporation {rustam,patt}@hps.utexas.edu ebrahimi@hps.utexas.edu

More information

Linux Performance on IBM zenterprise 196

Linux Performance on IBM zenterprise 196 Martin Kammerer martin.kammerer@de.ibm.com 9/27/10 Linux Performance on IBM zenterprise 196 visit us at http://www.ibm.com/developerworks/linux/linux390/perf/index.html Trademarks IBM, the IBM logo, and

More information

Generating Low-Overhead Dynamic Binary Translators

Generating Low-Overhead Dynamic Binary Translators Generating Low-Overhead Dynamic Binary Translators Mathias Payer ETH Zurich, Switzerland mathias.payer@inf.ethz.ch Thomas R. Gross ETH Zurich, Switzerland trg@inf.ethz.ch Abstract Dynamic (on the fly)

More information

Speedup Factor Estimation through Dynamic Behavior Analysis for FPGA

Speedup Factor Estimation through Dynamic Behavior Analysis for FPGA Speedup Factor Estimation through Dynamic Behavior Analysis for FPGA Zhongda Yuan 1, Jinian Bian 1, Qiang Wu 2, Oskar Mencer 2 1 Dept. of Computer Science and Technology, Tsinghua Univ., Beijing 100084,

More information

Floating-Point Arithmetic

Floating-Point Arithmetic Floating-Point Arithmetic 1 Numerical Analysis a definition sources of error 2 Floating-Point Numbers floating-point representation of a real number machine precision 3 Floating-Point Arithmetic adding

More information

Computational Economics and Finance

Computational Economics and Finance Computational Economics and Finance Part I: Elementary Concepts of Numerical Analysis Spring 2015 Outline Computer arithmetic Error analysis: Sources of error Error propagation Controlling the error Rates

More information

Exact solutions to mixed-integer linear programming problems

Exact solutions to mixed-integer linear programming problems Exact solutions to mixed-integer linear programming problems Dan Steffy Zuse Institute Berlin and Oakland University Joint work with Bill Cook, Thorsten Koch and Kati Wolter November 18, 2011 Mixed-Integer

More information

PiCL: a Software-Transparent, Persistent Cache Log for Nonvolatile Main Memory

PiCL: a Software-Transparent, Persistent Cache Log for Nonvolatile Main Memory PiCL: a Software-Transparent, Persistent Cache Log for Nonvolatile Main Memory Tri M. Nguyen Department of Electrical Engineering Princeton University Princeton, USA trin@princeton.edu David Wentzlaff

More information

ROPdefender: A Detection Tool to Defend Against Return-Oriented Programming Attacks

ROPdefender: A Detection Tool to Defend Against Return-Oriented Programming Attacks ROPdefender: A Detection Tool to Defend Against Return-Oriented Programming Attacks Lucas Davi, Ahmad-Reza Sadeghi, Marcel Winandy ABSTRACT System Security Lab Technische Universität Darmstadt Darmstadt,

More information

Floating-point numbers. Phys 420/580 Lecture 6

Floating-point numbers. Phys 420/580 Lecture 6 Floating-point numbers Phys 420/580 Lecture 6 Random walk CA Activate a single cell at site i = 0 For all subsequent times steps, let the active site wander to i := i ± 1 with equal probability Random

More information

SPARC64 VII Fujitsu s Next Generation Quad-Core Processor

SPARC64 VII Fujitsu s Next Generation Quad-Core Processor SPARC64 VII Fujitsu s Next Generation Quad-Core Processor August 26, 2008 Takumi Maruyama LSI Development Division Next Generation Technical Computing Unit Fujitsu Limited High Performance Technology High

More information

Efficient and Effective Misaligned Data Access Handling in a Dynamic Binary Translation System

Efficient and Effective Misaligned Data Access Handling in a Dynamic Binary Translation System Efficient and Effective Misaligned Data Access Handling in a Dynamic Binary Translation System JIANJUN LI, Institute of Computing Technology Graduate University of Chinese Academy of Sciences CHENGGANG

More information

562 IEEE TRANSACTIONS ON COMPUTERS, VOL. 65, NO. 2, FEBRUARY 2016

562 IEEE TRANSACTIONS ON COMPUTERS, VOL. 65, NO. 2, FEBRUARY 2016 562 IEEE TRANSACTIONS ON COMPUTERS, VOL. 65, NO. 2, FEBRUARY 2016 Memory Bandwidth Management for Efficient Performance Isolation in Multi-Core Platforms Heechul Yun, Gang Yao, Rodolfo Pellizzoni, Member,

More information

Roundoff Errors and Computer Arithmetic

Roundoff Errors and Computer Arithmetic Jim Lambers Math 105A Summer Session I 2003-04 Lecture 2 Notes These notes correspond to Section 1.2 in the text. Roundoff Errors and Computer Arithmetic In computing the solution to any mathematical problem,

More information

Efficient Physical Register File Allocation with Thread Suspension for Simultaneous Multi-Threading Processors

Efficient Physical Register File Allocation with Thread Suspension for Simultaneous Multi-Threading Processors Efficient Physical Register File Allocation with Thread Suspension for Simultaneous Multi-Threading Processors Wenun Wang and Wei-Ming Lin Department of Electrical and Computer Engineering, The University

More information

Energy-Based Accounting and Scheduling of Virtual Machines in a Cloud System

Energy-Based Accounting and Scheduling of Virtual Machines in a Cloud System Energy-Based Accounting and Scheduling of Virtual Machines in a Cloud System Nakku Kim Email: nkkim@unist.ac.kr Jungwook Cho Email: jmanbal@unist.ac.kr School of Electrical and Computer Engineering, Ulsan

More information

Reals 1. Floating-point numbers and their properties. Pitfalls of numeric computation. Horner's method. Bisection. Newton's method.

Reals 1. Floating-point numbers and their properties. Pitfalls of numeric computation. Horner's method. Bisection. Newton's method. Reals 1 13 Reals Floating-point numbers and their properties. Pitfalls of numeric computation. Horner's method. Bisection. Newton's method. 13.1 Floating-point numbers Real numbers, those declared to be

More information

Algorithm must complete after a finite number of instructions have been executed. Each step must be clearly defined, having only one interpretation.

Algorithm must complete after a finite number of instructions have been executed. Each step must be clearly defined, having only one interpretation. Algorithms 1 algorithm: a finite set of instructions that specify a sequence of operations to be carried out in order to solve a specific problem or class of problems An algorithm must possess the following

More information

PMCTrack: Delivering performance monitoring counter support to the OS scheduler

PMCTrack: Delivering performance monitoring counter support to the OS scheduler PMCTrack: Delivering performance monitoring counter support to the OS scheduler J. C. Saez, A. Pousa, R. Rodríguez-Rodríguez, F. Castro, M. Prieto-Matias ArTeCS Group, Facultad de Informática, Complutense

More information

2.1.1 Fixed-Point (or Integer) Arithmetic

2.1.1 Fixed-Point (or Integer) Arithmetic x = approximation to true value x error = x x, relative error = x x. x 2.1.1 Fixed-Point (or Integer) Arithmetic A base 2 (base 10) fixed-point number has a fixed number of binary (decimal) places. 1.

More information

Computational Methods. Sources of Errors

Computational Methods. Sources of Errors Computational Methods Sources of Errors Manfred Huber 2011 1 Numerical Analysis / Scientific Computing Many problems in Science and Engineering can not be solved analytically on a computer Numeric solutions

More information

Function Interface Analysis: A Principled Approach for Function Recognition in COTS Binaries

Function Interface Analysis: A Principled Approach for Function Recognition in COTS Binaries Function Interface Analysis: A Principled Approach for Function Recognition in COTS Binaries Rui Qiao Stony Brook University ruqiao@cs.stonybrook.edu Abstract Function recognition is one of the key tasks

More information

Mathematical preliminaries and error analysis

Mathematical preliminaries and error analysis Mathematical preliminaries and error analysis Tsung-Ming Huang Department of Mathematics National Taiwan Normal University, Taiwan August 28, 2011 Outline 1 Round-off errors and computer arithmetic IEEE

More information

What Every Programmer Should Know About Floating-Point Arithmetic

What Every Programmer Should Know About Floating-Point Arithmetic What Every Programmer Should Know About Floating-Point Arithmetic Last updated: October 15, 2015 Contents 1 Why don t my numbers add up? 3 2 Basic Answers 3 2.1 Why don t my numbers, like 0.1 + 0.2 add

More information

Software-Controlled Transparent Management of Heterogeneous Memory Resources in Virtualized Systems

Software-Controlled Transparent Management of Heterogeneous Memory Resources in Virtualized Systems Software-Controlled Transparent Management of Heterogeneous Memory Resources in Virtualized Systems Min Lee Vishal Gupta Karsten Schwan College of Computing Georgia Institute of Technology {minlee,vishal,schwan}@cc.gatech.edu

More information

Floating Point Arithmetic

Floating Point Arithmetic Floating Point Arithmetic Computer Systems, Section 2.4 Abstraction Anything that is not an integer can be thought of as . e.g. 391.1356 Or can be thought of as + /

More information

Minimalist Open-page: A DRAM Page-mode Scheduling Policy for the Many-core Era

Minimalist Open-page: A DRAM Page-mode Scheduling Policy for the Many-core Era Minimalist Open-page: A DRAM Page-mode Scheduling Policy for the Many-core Era Dimitris Kaseridis Electrical and Computer Engineering The University of Texas at Austin Austin, TX, USA kaseridis@mail.utexas.edu

More information

IAENG International Journal of Computer Science, 40:4, IJCS_40_4_07. RDCC: A New Metric for Processor Workload Characteristics Evaluation

IAENG International Journal of Computer Science, 40:4, IJCS_40_4_07. RDCC: A New Metric for Processor Workload Characteristics Evaluation RDCC: A New Metric for Processor Workload Characteristics Evaluation Tianyong Ao, Pan Chen, Zhangqing He, Kui Dai, Xuecheng Zou Abstract Understanding the characteristics of workloads is extremely important

More information

Lecture Objectives. Structured Programming & an Introduction to Error. Review the basic good habits of programming

Lecture Objectives. Structured Programming & an Introduction to Error. Review the basic good habits of programming Structured Programming & an Introduction to Error Lecture Objectives Review the basic good habits of programming To understand basic concepts of error and error estimation as it applies to Numerical Methods

More information

High System-Code Security with Low Overhead

High System-Code Security with Low Overhead High System-Code Security with Low Overhead Jonas Wagner, Volodymyr Kuznetsov, George Candea, and Johannes Kinder École Polytechnique Fédérale de Lausanne Royal Holloway, University of London High System-Code

More information

Sampling Dead Block Prediction for Last-Level Caches

Sampling Dead Block Prediction for Last-Level Caches Appears in Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-43), December 2010 Sampling Dead Block Prediction for Last-Level Caches Samira Khan, Yingying Tian,

More information

Impact of Compiler Optimizations on Voltage Droops and Reliability of an SMT, Multi-Core Processor

Impact of Compiler Optimizations on Voltage Droops and Reliability of an SMT, Multi-Core Processor Impact of ompiler Optimizations on Voltage roops and Reliability of an SMT, Multi-ore Processor Youngtaek Kim Lizy Kurian John epartment of lectrical & omputer ngineering The University of Texas at ustin

More information

SIPE: Small Integer Plus Exponent

SIPE: Small Integer Plus Exponent SIPE: Small Integer Plus Exponent Vincent LEFÈVRE AriC, INRIA Grenoble Rhône-Alpes / LIP, ENS-Lyon Arith 21, Austin, Texas, USA, 2013-04-09 Introduction: Why SIPE? All started with floating-point algorithms

More information

Potential for hardware-based techniques for reuse distance analysis

Potential for hardware-based techniques for reuse distance analysis Michigan Technological University Digital Commons @ Michigan Tech Dissertations, Master's Theses and Master's Reports - Open Dissertations, Master's Theses and Master's Reports 2011 Potential for hardware-based

More information

! CS61C : Machine Structures. Lecture 27 Performance II & Inter-machine Parallelism. !!Instructor Paul Pearce!

! CS61C : Machine Structures. Lecture 27 Performance II & Inter-machine Parallelism. !!Instructor Paul Pearce! inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 27 Performance II & Inter-machine Parallelism 2010-08-05!!!Instructor Paul Pearce! DENSITY LIMITS IN HARD DRIVES?! Yesterday Samsung! announced

More information

Exploi'ng Compressed Block Size as an Indicator of Future Reuse

Exploi'ng Compressed Block Size as an Indicator of Future Reuse Exploi'ng Compressed Block Size as an Indicator of Future Reuse Gennady Pekhimenko, Tyler Huberty, Rui Cai, Onur Mutlu, Todd C. Mowry Phillip B. Gibbons, Michael A. Kozuch Execu've Summary In a compressed

More information

ECE 252 / CPS 220 Advanced Computer Architecture I. Administrivia. Instructors. Where to Get Answers

ECE 252 / CPS 220 Advanced Computer Architecture I. Administrivia. Instructors. Where to Get Answers ECE 252 / CPS 220 Advanced Computer Architecture I Fall 2007 Duke University Prof. Daniel Sorin (sorin@ee.duke.edu) Administrivia addresses, email, website, etc. list of topics expected background course

More information

Floating-point Precision vs Performance Trade-offs

Floating-point Precision vs Performance Trade-offs Floating-point Precision vs Performance Trade-offs Wei-Fan Chiang School of Computing, University of Utah 1/31 Problem Statement Accelerating computations using graphical processing units has made significant

More information

2 Computation with Floating-Point Numbers

2 Computation with Floating-Point Numbers 2 Computation with Floating-Point Numbers 2.1 Floating-Point Representation The notion of real numbers in mathematics is convenient for hand computations and formula manipulations. However, real numbers

More information

SiNUCA: A Validated Micro-Architecture Simulator

SiNUCA: A Validated Micro-Architecture Simulator SiNUCA: A Validated Micro-Architecture ulator Marco A. Z. Alves, Matthias Diener, Francis B. Moreira, Philippe O. A. Navaux Informatics Institute Federal University of Rio Grande do Sul Email: {mazalves,

More information

CSI33 Data Structures

CSI33 Data Structures Outline Department of Mathematics and Computer Science Bronx Community College September 6, 2017 Outline Outline 1 Chapter 2: Data Abstraction Outline Chapter 2: Data Abstraction 1 Chapter 2: Data Abstraction

More information

Analysis of Program Based on Function Block

Analysis of Program Based on Function Block Analysis of Program Based on Function Block Wu Weifeng China National Digital Switching System Engineering & Technological Research Center Zhengzhou, China beewwf@sohu.com Abstract-Basic block in program

More information

Computing System Fundamentals/Trends + Review of Performance Evaluation and ISA Design

Computing System Fundamentals/Trends + Review of Performance Evaluation and ISA Design Computing System Fundamentals/Trends + Review of Performance Evaluation and ISA Design Computing Element Choices: Computing Element Programmability Spatial vs. Temporal Computing Main Processor Types/Applications

More information

Decoupled Dynamic Cache Segmentation

Decoupled Dynamic Cache Segmentation Appears in Proceedings of the 8th International Symposium on High Performance Computer Architecture (HPCA-8), February, 202. Decoupled Dynamic Cache Segmentation Samira M. Khan, Zhe Wang and Daniel A.

More information

Umbra: Efficient and Scalable Memory Shadowing

Umbra: Efficient and Scalable Memory Shadowing Umbra: Efficient and Scalable Memory Shadowing Qin Zhao CSAIL Massachusetts Institute of Technology Cambridge, MA, USA qin zhao@csail.mit.edu Derek Bruening VMware, Inc. bruening@vmware.com Saman Amarasinghe

More information

CS 261 Fall Floating-Point Numbers. Mike Lam, Professor.

CS 261 Fall Floating-Point Numbers. Mike Lam, Professor. CS 261 Fall 2018 Mike Lam, Professor https://xkcd.com/217/ Floating-Point Numbers Floating-point Topics Binary fractions Floating-point representation Conversions and rounding error Binary fractions Now

More information

Dynamic Floating-Point Cancellation Detection

Dynamic Floating-Point Cancellation Detection Dynamic Floating-Point Cancellation Detection Michael O. Lam Department of Computer Science University of Maryland, College Park Email: lam@cs.umd.edu April 20, 2010 Abstract Floating-point rounding error

More information

Floating Point Representation in Computers

Floating Point Representation in Computers Floating Point Representation in Computers Floating Point Numbers - What are they? Floating Point Representation Floating Point Operations Where Things can go wrong What are Floating Point Numbers? Any

More information