A Dynamic Program Analysis to find Floating-Point Accuracy Problems
|
|
- Laura Davidson
- 5 years ago
- Views:
Transcription
1 1 A Dynamic Program Analysis to find Floating-Point Accuracy Problems Florian Benz fbenz@stud.uni-saarland.de Andreas Hildebrandt andreas.hildebrandt@uni-mainz.de Sebastian Hack hack@cs.uni-saarland.de PLDI 2012, Beijing June 13, 2012
2 2 Introduction Floating-point arithmetic is ubiquitous Almost every language has a floating-point data type Most PCs and supercomputers have floating-point accelerators Not well understood by most developers
3 2 Introduction Floating-point arithmetic is ubiquitous Almost every language has a floating-point data type Most PCs and supercomputers have floating-point accelerators Not well understood by most developers Our Contribution A dynamic program analysis that assists developers in understanding and tracking down floating-point arithmetic issues in real-world programs.
4 3 Insufficient Precision Floating-Point Arithmetic Problems Finite Precision Cancellation Finding the Cause
5 4 Insufficient Precision Floating-Point Arithmetic Problems Finite Precision Cancellation Finding the Cause
6 5 Problem: Insufficient Precision float e = f; float sum = 1.0f; int i; for (i = 0; i < 5; i++) { sum += e; } Finally sum = 1.0
7 5 Problem: Insufficient Precision float e = f; float sum = 1.0f; int i; for (i = 0; i < 5; i++) { sum += e; } Finally sum = 1.0 Higher precision yields sum = Single precision machine epsilon: f
8 6 Solution: Side by Side in Higher Precision Original float e = f; float sum = 1.0f; int i; for (i = 0; i < 5; i++) { sum += e; }
9 6 Solution: Side by Side in Higher Precision Original float e = f; float sum = 1.0f; int i; for (i = 0; i < 5; i++) { sum += e; } Higher precision sum += e; Side-by-side computation in higher precision
10 6 Solution: Side by Side in Higher Precision Original float e = f; float sum = 1.0f; int i; for (i = 0; i < 5; i++) { sum += e; } Higher precision e = ; sum = 1.0; sum += e; Side-by-side computation in higher precision Shadowing every floating-point value
11 7 Solution: Side by Side in Higher Precision Original float e = f; float sum = 1.0f; int i; for (i = 0; i < 5; i++) { sum += e; } Higher precision e = ; sum = 1.0; sum += e; Iteration Single precision Higher precision
12 8 Error Measurement relative error = exact value approximate value exact value
13 8 Error Measurement relative error = exact value approximate value exact value Approximate exact value with higher precision value
14 8 Error Measurement relative error = exact value approximate value exact value Approximate exact value with higher precision value =
15 8 Error Measurement relative error = exact value approximate value exact value Approximate exact value with higher precision value = Relative errors smaller than machine epsilon are unavoidable float double
16 9 Insufficient Precision Floating-Point Arithmetic Problems Finite Precision Cancellation Finding the Cause
17 10 Problem: Cancellation Benign
18 10 Problem: Cancellation Benign exact inexact
19 10 Problem: Cancellation Benign exact inexact relative error
20 10 Problem: Cancellation Benign exact inexact relative error Catastrophic exact inexact
21 10 Problem: Cancellation Benign exact inexact relative error Catastrophic exact inexact relative error
22 11 Solution: Cancellation Badness Benign exact inexact relative error (canceled) 7 (exact) + 1 = 0 0
23 11 Solution: Cancellation Badness Benign Catastrophic exact inexact relative error (canceled) 7 (exact) + 1 = 0 exact inexact relative error (canceled) 6 (exact) + 1 = 1
24 12 Insufficient Precision Floating-Point Arithmetic Problems Finite Precision Cancellation Finding the Cause
25 13 Problem: Finding the Cause 1 float e = f; 2 float x = 0.5f; 3 float y = 1.0f + x; 4 float more = y + e; 5 float diff e = more - y; 6 float diff 0 = diff e - e; 7 float zero = diff 0 + diff 0; 8 float result = 2 * zero; result = Higher precision yields result = 0
26 13 Solution: Light-Weight Slicing 1 float e = f; 2 float x = 0.5f; 3 float y = 1.0f + x; 4 float more = y + e; 5 float diff e = more - y; 6 float diff 0 = diff e - e; 7 float zero = diff 0 + diff 0; 8 float result = 2 * zero; Add (4) Sub (5) Sub (6) Add (7) 0 0 Add (8) result = Higher precision yields result = 0
27 14 Insufficient Precision Floating-Point Arithmetic Problems Finite Precision Cancellation Finding the Cause
28 Problem: Finite Precision 1 2 if n = 0, u n = 4 if n = 1, u n u n 1 u n 2 if n > 1. Mathematically correct lim u n = 6 n 1 Muller et al.: Handbook of Floating-Point Arithmetic, Birkhäuser,
29 Problem: Finite Precision 1 2 if n = 0, u n = 4 if n = 1, u n u n 1 u n 2 if n > 1. Mathematically correct For all finite precisions lim u n = 6 n lim u n = 100 n 1 Muller et al.: Handbook of Floating-Point Arithmetic, Birkhäuser,
30 16 Analysis of the Problem 100 u Double precision (53 bit) Higher precision (here: 128 bit) Correct n Fully automatic analysis detects no error
31 16 Analysis of the Problem 100 u Double precision (53 bit) Higher precision (here: 128 bit) Correct n Fully automatic analysis detects no error Can only be discovered with intermediate results
32 17 Solution: Stages int i; double u, v, w; u = 2; v = -4; for (i = 3; i <= 50; i++) { w = 111. u = v; v = w; /v /(v*u); }
33 17 Solution: Stages int i; double u, v, w; u = 2; v = -4; for (i = 3; i <= 50; i++) { FPDEBUG BEGIN STAGE(0); w = 111. u = v; v = w; /v /(v*u); } FPDEBUG END STAGE(0);
34 18 Case Study: Biochemical Algorithms Library (BALL) > 400, 000 lines of code
35 18 Case Study: Biochemical Algorithms Library (BALL) > 400, 000 lines of code double StretchComponent::updateEnergy() {... double distance = atom1->getdistance(*atom2); energy += stretch [i].values.k * (distance - stretch [i].values.r0) * (distance - stretch [i].values.r0);... }
36 18 Case Study: Biochemical Algorithms Library (BALL) > 400, 000 lines of code double StretchComponent::updateEnergy() {... double distance = atom1->getdistance(*atom2); energy += stretch [i].values.k * (distance - stretch [i].values.r0) * (distance - stretch [i].values.r0);... } Catastrophic cancellation At most 24 bits canceled (double: 53 bit precision)
37 18 Case Study: Biochemical Algorithms Library (BALL) > 400, 000 lines of code double StretchComponent::updateEnergy() {... double distance = atom1->getdistance(*atom2); energy += stretch [i].values.k * (distance - stretch [i].values.r0) * (distance - stretch [i].values.r0);... } Catastrophic cancellation At most 24 bits canceled (double: 53 bit precision) float Atom::getDistance(const Atom& a) const
38 Case Study: GNU Linear Programming Kit (GLPK) 2 > 100, 000 lines of code min x 20 s.t. (s + 1) x 1 x 2 s 1, sx i 1 + (s + 1) x i x i+1 ( 1) i (s + 1) for i = 2 : 19, sx 18 (3s 1) x x 20 (5s 7), 0 x i 10 for i = 1 : 13, 0 x i B for i = 14 : 20, all x i integers, 2 Neumaier and Shcherbina: Safe bounds in linear and mixed-integer linear programming, Mathematical Programming,
39 Case Study: GNU Linear Programming Kit (GLPK) 2 > 100, 000 lines of code min x 20 s.t. (s + 1) x 1 x 2 s 1, sx i 1 + (s + 1) x i x i+1 ( 1) i (s + 1) for i = 2 : 19, sx 18 (3s 1) x x 20 (5s 7), 0 x i 10 for i = 1 : 13, 0 x i B for i = 14 : 20, all x i integers, Unique solution if B 2 x = (1, 2, 1, 2,..., 1, 2) T 2 Neumaier and Shcherbina: Safe bounds in linear and mixed-integer linear programming, Mathematical Programming,
40 Case Study: GNU Linear Programming Kit (GLPK) Binary search B = x = (1, 2, 1, 2,..., 1, 2) T B = Problem has no integer feasible solution 20
41 Case Study: GNU Linear Programming Kit (GLPK) Binary search B = x = (1, 2, 1, 2,..., 1, 2) T B = Problem has no integer feasible solution Compared the runs Found variable that differs Bound Shadow Original
42 Case Study: Calculix (SPEC CFP2006) double DVdot (int size, double y[], double x[]) { double sum = 0.0; int i; for (i = 0; i < size; i++) { sum += y[i] * x[i]; } } return sum; 21
43 Case Study: Calculix (SPEC CFP2006) double DVdot (int size, double y[], double x[]) { FPDEBUG BEGIN(); double sum = 0.0; int i; for (i = 0; i < size; i++) { sum += y[i] * x[i]; } } FPDEBUG END(); return sum; 21
44 Case Study: Calculix (SPEC CFP2006) double DVdot (int size, double y[], double x[]) { FPDEBUG BEGIN(); double sum = 0.0; int i; for (i = 0; i < size; i++) { sum += y[i] * x[i]; } } FPDEBUG INSERT SHADOW(&sum); FPDEBUG END(); return sum; 21
45 Case Study: Calculix (SPEC CFP2006) double DVdot (int size, double y[], double x[]) { FPDEBUG BEGIN(); } double sum = 0.0; int i; for (i = 0; i < size; i++) { sum += y[i] * x[i]; } double errorbound = 1e-2; if (FPDEBUG ERROR GREATER(&sum, &errorbound)) { /* Print arrays x, and y */ } FPDEBUG INSERT SHADOW(&sum); FPDEBUG END(); return sum; 21
46 22 Performance: SPEC CFP2006 Benchmark Original Analyzed Slowdown bwaves s s 167 x gamess 0.70 s s 544 x milc s s 224 x gromacs 2.10 s s 472 x cactusadm 4.70 s s 1016 x leslie3d s s 292 x namd s s 957 x soplex 0.03 s 5.00 s 185 x povray 0.90 s s 444 x calculix 0.07 s s 244 x GemsFDTD 5.50 s s 208 x tonto 1.26 s s 321 x lbm 9.55 s s 303 x wrf 7.68 s s 342 x sphinx s s 213 x
47 23 Conclusion Dynamic program analysis Detects floating-point accuracy problems Detects catastrophic cancellations Works on large-scale programs Finds real-world problems Is open source: github.com/fbenz/fpdebug
48 23 Conclusion Dynamic program analysis Detects floating-point accuracy problems Detects catastrophic cancellations Works on large-scale programs Finds real-world problems Is open source: github.com/fbenz/fpdebug Thank You! Questions?
A Dynamic Program Analysis to find Floating-Point Accuracy Problems
A Dynamic Program Analysis to find Floating-Point Accuracy Problems Florian Benz Saarland University fbenz@stud.uni-saarland.de Andreas Hildebrandt Johannes-Gutenberg Universität Mainz andreas.hildebrandt@uni-mainz.de
More informationA Fast Instruction Set Simulator for RISC-V
A Fast Instruction Set Simulator for RISC-V Maxim.Maslov@esperantotech.com Vadim.Gimpelson@esperantotech.com Nikita.Voronov@esperantotech.com Dave.Ditzel@esperantotech.com Esperanto Technologies, Inc.
More informationResource-Conscious Scheduling for Energy Efficiency on Multicore Processors
Resource-Conscious Scheduling for Energy Efficiency on Andreas Merkel, Jan Stoess, Frank Bellosa System Architecture Group KIT The cooperation of Forschungszentrum Karlsruhe GmbH and Universität Karlsruhe
More informationLightweight Memory Tracing
Lightweight Memory Tracing Mathias Payer*, Enrico Kravina, Thomas Gross Department of Computer Science ETH Zürich, Switzerland * now at UC Berkeley Memory Tracing via Memlets Execute code (memlets) for
More informationFootprint-based Locality Analysis
Footprint-based Locality Analysis Xiaoya Xiang, Bin Bao, Chen Ding University of Rochester 2011-11-10 Memory Performance On modern computer system, memory performance depends on the active data usage.
More informationImproving Cache Performance by Exploi7ng Read- Write Disparity. Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A.
Improving Cache Performance by Exploi7ng Read- Write Disparity Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A. Jiménez Summary Read misses are more cri?cal than write misses
More informationUCB CS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 36 Performance 2010-04-23 Lecturer SOE Dan Garcia How fast is your computer? Every 6 months (Nov/June), the fastest supercomputers in
More informationNightWatch: Integrating Transparent Cache Pollution Control into Dynamic Memory Allocation Systems
NightWatch: Integrating Transparent Cache Pollution Control into Dynamic Memory Allocation Systems Rentong Guo 1, Xiaofei Liao 1, Hai Jin 1, Jianhui Yue 2, Guang Tan 3 1 Huazhong University of Science
More informationISA-Aging. (SHRINK: Reducing the ISA Complexity Via Instruction Recycling) Accepted for ISCA 2015
ISA-Aging (SHRINK: Reducing the ISA Complexity Via Instruction Recycling) Accepted for ISCA 2015 Bruno Cardoso Lopes, Rafael Auler, Edson Borin, Luiz Ramos, Rodolfo Azevedo, University of Campinas, Brasil
More informationEnergy Models for DVFS Processors
Energy Models for DVFS Processors Thomas Rauber 1 Gudula Rünger 2 Michael Schwind 2 Haibin Xu 2 Simon Melzner 1 1) Universität Bayreuth 2) TU Chemnitz 9th Scheduling for Large Scale Systems Workshop July
More informationPerformance. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
Performance Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Defining Performance (1) Which airplane has the best performance? Boeing 777 Boeing
More informationArchitecture of Parallel Computer Systems - Performance Benchmarking -
Architecture of Parallel Computer Systems - Performance Benchmarking - SoSe 18 L.079.05810 www.uni-paderborn.de/pc2 J. Simon - Architecture of Parallel Computer Systems SoSe 2018 < 1 > Definition of Benchmark
More informationImproving Cache Performance by Exploi7ng Read- Write Disparity. Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A.
Improving Cache Performance by Exploi7ng Read- Write Disparity Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A. Jiménez Summary Read misses are more cri?cal than write misses
More informationIntroducing the GCC to the Polyhedron Model
1/15 Michael Claßen University of Passau St. Goar, June 30th 2009 2/15 Agenda Agenda 1 GRAPHITE Introduction Status of GRAPHITE 2 The Polytope Model in GRAPHITE What code can be represented? GPOLY - The
More informationUCB CS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 38 Performance 2008-04-30 Lecturer SOE Dan Garcia How fast is your computer? Every 6 months (Nov/June), the fastest supercomputers in
More informationEnergy Proportional Datacenter Memory. Brian Neel EE6633 Fall 2012
Energy Proportional Datacenter Memory Brian Neel EE6633 Fall 2012 Outline Background Motivation Related work DRAM properties Designs References Background The Datacenter as a Computer Luiz André Barroso
More informationComputer Architecture. Introduction
to Computer Architecture 1 Computer Architecture What is Computer Architecture From Wikipedia, the free encyclopedia In computer engineering, computer architecture is a set of rules and methods that describe
More informationDetection of Weak Spots in Benchmarks Memory Space by using PCA and CA
Leonardo Electronic Journal of Practices and Technologies ISSN 1583-1078 Issue 16, January-June 2010 p. 43-52 Detection of Weak Spots in Memory Space by using PCA and CA Abdul Kareem PARCHUR *, Fazal NOORBASHA
More informationPerformance Characterization of SPEC CPU Benchmarks on Intel's Core Microarchitecture based processor
Performance Characterization of SPEC CPU Benchmarks on Intel's Core Microarchitecture based processor Sarah Bird ϕ, Aashish Phansalkar ϕ, Lizy K. John ϕ, Alex Mericas α and Rajeev Indukuru α ϕ University
More informationData Prefetching by Exploiting Global and Local Access Patterns
Journal of Instruction-Level Parallelism 13 (2011) 1-17 Submitted 3/10; published 1/11 Data Prefetching by Exploiting Global and Local Access Patterns Ahmad Sharif Hsien-Hsin S. Lee School of Electrical
More informationPIPELINING AND PROCESSOR PERFORMANCE
PIPELINING AND PROCESSOR PERFORMANCE Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 1, John L. Hennessy and David A. Patterson, Morgan Kaufmann,
More informationSandbox Based Optimal Offset Estimation [DPC2]
Sandbox Based Optimal Offset Estimation [DPC2] Nathan T. Brown and Resit Sendag Department of Electrical, Computer, and Biomedical Engineering Outline Motivation Background/Related Work Sequential Offset
More informationPerceptron Learning for Reuse Prediction
Perceptron Learning for Reuse Prediction Elvira Teran Zhe Wang Daniel A. Jiménez Texas A&M University Intel Labs {eteran,djimenez}@tamu.edu zhe2.wang@intel.com Abstract The disparity between last-level
More informationOpen Access Research on the Establishment of MSR Model in Cloud Computing based on Standard Performance Evaluation
Send Orders for Reprints to reprints@benthamscience.ae The Open Automation and Control Systems Journal, 2015, 7, 821-825 821 Open Access Research on the Establishment of MSR Model in Cloud Computing based
More informationAddressing End-to-End Memory Access Latency in NoC-Based Multicores
Addressing End-to-End Memory Access Latency in NoC-Based Multicores Akbar Sharifi, Emre Kultursay, Mahmut Kandemir and Chita R. Das The Pennsylvania State University University Park, PA, 682, USA {akbar,euk39,kandemir,das}@cse.psu.edu
More informationEfficient Memory Shadowing for 64-bit Architectures
Efficient Memory Shadowing for 64-bit Architectures The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation Qin Zhao, Derek Bruening,
More informationCPU Performance Evaluation: Cycles Per Instruction (CPI) Most computers run synchronously utilizing a CPU clock running at a constant clock rate:
CPI CPU Performance Evaluation: Cycles Per Instruction (CPI) Most computers run synchronously utilizing a CPU clock running at a constant clock rate: Clock cycle where: Clock rate = 1 / clock cycle f =
More informationLoop-Oriented Array- and Field-Sensitive Pointer Analysis for Automatic SIMD Vectorization
Loop-Oriented Array- and Field-Sensitive Pointer Analysis for Automatic SIMD Vectorization Yulei Sui, Xiaokang Fan, Hao Zhou and Jingling Xue School of Computer Science and Engineering The University of
More informationNear-Threshold Computing: How Close Should We Get?
Near-Threshold Computing: How Close Should We Get? Alaa R. Alameldeen Intel Labs Workshop on Near-Threshold Computing June 14, 2014 Overview High-level talk summarizing my architectural perspective on
More informationREEL: Reducing Effective Execution Latency of. Floating Point Operations
REEL: Reducing Effective Execution Latency of Floating Point Operations Vignyan Reddy, Syed Zohaib Gilani, Erika Gunadi, Nam Sung Kim, Michael J Schulte and Mikko H Lipasti University of Wisconsin-Madison
More informationHOTL: a Higher Order Theory of Locality
HOTL: a Higher Order Theory of Locality Xiaoya Xiang Chen Ding Hao Luo Department of Computer Science University of Rochester {xiang, cding, hluo}@cs.rochester.edu Bin Bao Adobe Systems Incorporated bbao@adobe.com
More informationInsertion and Promotion for Tree-Based PseudoLRU Last-Level Caches
Insertion and Promotion for Tree-Based PseudoLRU Last-Level Caches Daniel A. Jiménez Department of Computer Science and Engineering Texas A&M University ABSTRACT Last-level caches mitigate the high latency
More informationCS377P Programming for Performance Single Thread Performance Out-of-order Superscalar Pipelines
CS377P Programming for Performance Single Thread Performance Out-of-order Superscalar Pipelines Sreepathi Pai UTCS September 14, 2015 Outline 1 Introduction 2 Out-of-order Scheduling 3 The Intel Haswell
More informationMultiperspective Reuse Prediction
ABSTRACT Daniel A. Jiménez Texas A&M University djimenezacm.org The disparity between last-level cache and memory latencies motivates the search for e cient cache management policies. Recent work in predicting
More informationAn SSA-based Algorithm for Optimal Speculative Code Motion under an Execution Profile
An SSA-based Algorithm for Optimal Speculative Code Motion under an Execution Profile Hucheng Zhou Tsinghua University zhou-hc07@mails.tsinghua.edu.cn Wenguang Chen Tsinghua University cwg@tsinghua.edu.cn
More informationLast time. Lecture #29 Performance & Parallel Intro
CS61C L29 Performance & Parallel (1) inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #29 Performance & Parallel Intro 2007-8-14 Scott Beamer, Instructor Paper Battery Developed by Researchers
More informationInformation System Architecture Natawut Nupairoj Ph.D. Department of Computer Engineering, Chulalongkorn University
2110684 Information System Architecture Natawut Nupairoj Ph.D. Department of Computer Engineering, Chulalongkorn University Agenda Capacity Planning Determining the production capacity needed by an organization
More informationThe information provided is intended to help designers and end users make performance
Configuring and Tuning for Performance on Intel 5100 Memory Controller Hub Chipset Based Platforms Contributor Perry Taylor Intel Corporation Index Words Intel 5100 Memory Controller Hub chipset Intel
More informationLightweight Memory Tracing
Lightweight Memory Tracing Mathias Payer ETH Zurich Enrico Kravina ETH Zurich Thomas R. Gross ETH Zurich Abstract Memory tracing (executing additional code for every memory access of a program) is a powerful
More informationBias Scheduling in Heterogeneous Multi-core Architectures
Bias Scheduling in Heterogeneous Multi-core Architectures David Koufaty Dheeraj Reddy Scott Hahn Intel Labs {david.a.koufaty, dheeraj.reddy, scott.hahn}@intel.com Abstract Heterogeneous architectures that
More informationA Front-end Execution Architecture for High Energy Efficiency
A Front-end Execution Architecture for High Energy Efficiency Ryota Shioya, Masahiro Goshima and Hideki Ando Department of Electrical Engineering and Computer Science, Nagoya University, Aichi, Japan Information
More informationHOTL: A Higher Order Theory of Locality
HOTL: A Higher Order Theory of Locality Xiaoya Xiang Chen Ding Hao Luo Department of Computer Science University of Rochester {xiang, cding, hluo}@cs.rochester.edu Bin Bao Adobe Systems Incorporated bbao@adobe.com
More informationThesis Defense Lavanya Subramanian
Providing High and Predictable Performance in Multicore Systems Through Shared Resource Management Thesis Defense Lavanya Subramanian Committee: Advisor: Onur Mutlu Greg Ganger James Hoe Ravi Iyer (Intel)
More informationInput-Sensitive Profiling
Input-Sensitive Profiling Emilio Coppa Dept. of Computer and System Sciences Sapienza University of Rome ercoppa@gmail.com Camil Demetrescu Dept. of Computer and System Sciences Sapienza University of
More informationIBM POWER Systems Compiler Roadmap
IBM POWER Systems Compiler Roadmap Roch Archambault IBM Toronto Laboratory archie@ca.ibm.com SCICOMP-14 May 22, 2008 Agenda Overall Roadmap The POWER Systems Compiler Products Detailed Roadmaps Common
More informationTranslation Caching: Skip, Don t Walk (the Page Table)
Translation Caching: Skip, Don t Walk (the Page Table) Thomas W. Barr, Alan L. Cox, Scott Rixner Rice University Houston, TX {twb, alc, rixner}@rice.edu ABSTRACT This paper explores the design space of
More informationMaking Data Prefetch Smarter: Adaptive Prefetching on POWER7
Making Data Prefetch Smarter: Adaptive Prefetching on POWER7 Víctor Jiménez Barcelona Supercomputing Center Barcelona, Spain victor.javier@bsc.es Alper Buyuktosunoglu IBM T. J. Watson Research Center Yorktown
More informationA Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach
A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. Mishra Onur Mutlu Chita R. Das Executive summary Problem: Current day NoC designs are agnostic to application requirements
More informationEKT 303 WEEK Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+ EKT 303 WEEK 2 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved. Chapter 2 + Performance Issues + Designing for Performance The cost of computer systems continues to drop dramatically,
More informationSEN361 Computer Organization. Prof. Dr. Hasan Hüseyin BALIK (2 nd Week)
+ SEN361 Computer Organization Prof. Dr. Hasan Hüseyin BALIK (2 nd Week) + Outline 1. Overview 1.1 Basic Concepts and Computer Evolution 1.2 Performance Issues + 1.2 Performance Issues + Designing for
More informationMulti-Cache Resizing via Greedy Coordinate Descent
Noname manuscript No. (will be inserted by the editor) Multi-Cache Resizing via Greedy Coordinate Descent I. Stephen Choi Donald Yeung Received: date / Accepted: date Abstract To reduce power consumption
More informationComputational Economics and Finance
Computational Economics and Finance Part I: Elementary Concepts of Numerical Analysis Spring 2016 Outline Computer arithmetic Error analysis: Sources of error Error propagation Controlling the error Rates
More informationPredicting Performance Impact of DVFS for Realistic Memory Systems
Predicting Performance Impact of DVFS for Realistic Memory Systems Rustam Miftakhutdinov Eiman Ebrahimi Yale N. Patt The University of Texas at Austin Nvidia Corporation {rustam,patt}@hps.utexas.edu ebrahimi@hps.utexas.edu
More informationLinux Performance on IBM zenterprise 196
Martin Kammerer martin.kammerer@de.ibm.com 9/27/10 Linux Performance on IBM zenterprise 196 visit us at http://www.ibm.com/developerworks/linux/linux390/perf/index.html Trademarks IBM, the IBM logo, and
More informationGenerating Low-Overhead Dynamic Binary Translators
Generating Low-Overhead Dynamic Binary Translators Mathias Payer ETH Zurich, Switzerland mathias.payer@inf.ethz.ch Thomas R. Gross ETH Zurich, Switzerland trg@inf.ethz.ch Abstract Dynamic (on the fly)
More informationSpeedup Factor Estimation through Dynamic Behavior Analysis for FPGA
Speedup Factor Estimation through Dynamic Behavior Analysis for FPGA Zhongda Yuan 1, Jinian Bian 1, Qiang Wu 2, Oskar Mencer 2 1 Dept. of Computer Science and Technology, Tsinghua Univ., Beijing 100084,
More informationFloating-Point Arithmetic
Floating-Point Arithmetic 1 Numerical Analysis a definition sources of error 2 Floating-Point Numbers floating-point representation of a real number machine precision 3 Floating-Point Arithmetic adding
More informationComputational Economics and Finance
Computational Economics and Finance Part I: Elementary Concepts of Numerical Analysis Spring 2015 Outline Computer arithmetic Error analysis: Sources of error Error propagation Controlling the error Rates
More informationExact solutions to mixed-integer linear programming problems
Exact solutions to mixed-integer linear programming problems Dan Steffy Zuse Institute Berlin and Oakland University Joint work with Bill Cook, Thorsten Koch and Kati Wolter November 18, 2011 Mixed-Integer
More informationPiCL: a Software-Transparent, Persistent Cache Log for Nonvolatile Main Memory
PiCL: a Software-Transparent, Persistent Cache Log for Nonvolatile Main Memory Tri M. Nguyen Department of Electrical Engineering Princeton University Princeton, USA trin@princeton.edu David Wentzlaff
More informationROPdefender: A Detection Tool to Defend Against Return-Oriented Programming Attacks
ROPdefender: A Detection Tool to Defend Against Return-Oriented Programming Attacks Lucas Davi, Ahmad-Reza Sadeghi, Marcel Winandy ABSTRACT System Security Lab Technische Universität Darmstadt Darmstadt,
More informationFloating-point numbers. Phys 420/580 Lecture 6
Floating-point numbers Phys 420/580 Lecture 6 Random walk CA Activate a single cell at site i = 0 For all subsequent times steps, let the active site wander to i := i ± 1 with equal probability Random
More informationSPARC64 VII Fujitsu s Next Generation Quad-Core Processor
SPARC64 VII Fujitsu s Next Generation Quad-Core Processor August 26, 2008 Takumi Maruyama LSI Development Division Next Generation Technical Computing Unit Fujitsu Limited High Performance Technology High
More informationEfficient and Effective Misaligned Data Access Handling in a Dynamic Binary Translation System
Efficient and Effective Misaligned Data Access Handling in a Dynamic Binary Translation System JIANJUN LI, Institute of Computing Technology Graduate University of Chinese Academy of Sciences CHENGGANG
More information562 IEEE TRANSACTIONS ON COMPUTERS, VOL. 65, NO. 2, FEBRUARY 2016
562 IEEE TRANSACTIONS ON COMPUTERS, VOL. 65, NO. 2, FEBRUARY 2016 Memory Bandwidth Management for Efficient Performance Isolation in Multi-Core Platforms Heechul Yun, Gang Yao, Rodolfo Pellizzoni, Member,
More informationRoundoff Errors and Computer Arithmetic
Jim Lambers Math 105A Summer Session I 2003-04 Lecture 2 Notes These notes correspond to Section 1.2 in the text. Roundoff Errors and Computer Arithmetic In computing the solution to any mathematical problem,
More informationEfficient Physical Register File Allocation with Thread Suspension for Simultaneous Multi-Threading Processors
Efficient Physical Register File Allocation with Thread Suspension for Simultaneous Multi-Threading Processors Wenun Wang and Wei-Ming Lin Department of Electrical and Computer Engineering, The University
More informationEnergy-Based Accounting and Scheduling of Virtual Machines in a Cloud System
Energy-Based Accounting and Scheduling of Virtual Machines in a Cloud System Nakku Kim Email: nkkim@unist.ac.kr Jungwook Cho Email: jmanbal@unist.ac.kr School of Electrical and Computer Engineering, Ulsan
More informationReals 1. Floating-point numbers and their properties. Pitfalls of numeric computation. Horner's method. Bisection. Newton's method.
Reals 1 13 Reals Floating-point numbers and their properties. Pitfalls of numeric computation. Horner's method. Bisection. Newton's method. 13.1 Floating-point numbers Real numbers, those declared to be
More informationAlgorithm must complete after a finite number of instructions have been executed. Each step must be clearly defined, having only one interpretation.
Algorithms 1 algorithm: a finite set of instructions that specify a sequence of operations to be carried out in order to solve a specific problem or class of problems An algorithm must possess the following
More informationPMCTrack: Delivering performance monitoring counter support to the OS scheduler
PMCTrack: Delivering performance monitoring counter support to the OS scheduler J. C. Saez, A. Pousa, R. Rodríguez-Rodríguez, F. Castro, M. Prieto-Matias ArTeCS Group, Facultad de Informática, Complutense
More information2.1.1 Fixed-Point (or Integer) Arithmetic
x = approximation to true value x error = x x, relative error = x x. x 2.1.1 Fixed-Point (or Integer) Arithmetic A base 2 (base 10) fixed-point number has a fixed number of binary (decimal) places. 1.
More informationComputational Methods. Sources of Errors
Computational Methods Sources of Errors Manfred Huber 2011 1 Numerical Analysis / Scientific Computing Many problems in Science and Engineering can not be solved analytically on a computer Numeric solutions
More informationFunction Interface Analysis: A Principled Approach for Function Recognition in COTS Binaries
Function Interface Analysis: A Principled Approach for Function Recognition in COTS Binaries Rui Qiao Stony Brook University ruqiao@cs.stonybrook.edu Abstract Function recognition is one of the key tasks
More informationMathematical preliminaries and error analysis
Mathematical preliminaries and error analysis Tsung-Ming Huang Department of Mathematics National Taiwan Normal University, Taiwan August 28, 2011 Outline 1 Round-off errors and computer arithmetic IEEE
More informationWhat Every Programmer Should Know About Floating-Point Arithmetic
What Every Programmer Should Know About Floating-Point Arithmetic Last updated: October 15, 2015 Contents 1 Why don t my numbers add up? 3 2 Basic Answers 3 2.1 Why don t my numbers, like 0.1 + 0.2 add
More informationSoftware-Controlled Transparent Management of Heterogeneous Memory Resources in Virtualized Systems
Software-Controlled Transparent Management of Heterogeneous Memory Resources in Virtualized Systems Min Lee Vishal Gupta Karsten Schwan College of Computing Georgia Institute of Technology {minlee,vishal,schwan}@cc.gatech.edu
More informationFloating Point Arithmetic
Floating Point Arithmetic Computer Systems, Section 2.4 Abstraction Anything that is not an integer can be thought of as . e.g. 391.1356 Or can be thought of as + /
More informationMinimalist Open-page: A DRAM Page-mode Scheduling Policy for the Many-core Era
Minimalist Open-page: A DRAM Page-mode Scheduling Policy for the Many-core Era Dimitris Kaseridis Electrical and Computer Engineering The University of Texas at Austin Austin, TX, USA kaseridis@mail.utexas.edu
More informationIAENG International Journal of Computer Science, 40:4, IJCS_40_4_07. RDCC: A New Metric for Processor Workload Characteristics Evaluation
RDCC: A New Metric for Processor Workload Characteristics Evaluation Tianyong Ao, Pan Chen, Zhangqing He, Kui Dai, Xuecheng Zou Abstract Understanding the characteristics of workloads is extremely important
More informationLecture Objectives. Structured Programming & an Introduction to Error. Review the basic good habits of programming
Structured Programming & an Introduction to Error Lecture Objectives Review the basic good habits of programming To understand basic concepts of error and error estimation as it applies to Numerical Methods
More informationHigh System-Code Security with Low Overhead
High System-Code Security with Low Overhead Jonas Wagner, Volodymyr Kuznetsov, George Candea, and Johannes Kinder École Polytechnique Fédérale de Lausanne Royal Holloway, University of London High System-Code
More informationSampling Dead Block Prediction for Last-Level Caches
Appears in Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-43), December 2010 Sampling Dead Block Prediction for Last-Level Caches Samira Khan, Yingying Tian,
More informationImpact of Compiler Optimizations on Voltage Droops and Reliability of an SMT, Multi-Core Processor
Impact of ompiler Optimizations on Voltage roops and Reliability of an SMT, Multi-ore Processor Youngtaek Kim Lizy Kurian John epartment of lectrical & omputer ngineering The University of Texas at ustin
More informationSIPE: Small Integer Plus Exponent
SIPE: Small Integer Plus Exponent Vincent LEFÈVRE AriC, INRIA Grenoble Rhône-Alpes / LIP, ENS-Lyon Arith 21, Austin, Texas, USA, 2013-04-09 Introduction: Why SIPE? All started with floating-point algorithms
More informationPotential for hardware-based techniques for reuse distance analysis
Michigan Technological University Digital Commons @ Michigan Tech Dissertations, Master's Theses and Master's Reports - Open Dissertations, Master's Theses and Master's Reports 2011 Potential for hardware-based
More information! CS61C : Machine Structures. Lecture 27 Performance II & Inter-machine Parallelism. !!Instructor Paul Pearce!
inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 27 Performance II & Inter-machine Parallelism 2010-08-05!!!Instructor Paul Pearce! DENSITY LIMITS IN HARD DRIVES?! Yesterday Samsung! announced
More informationExploi'ng Compressed Block Size as an Indicator of Future Reuse
Exploi'ng Compressed Block Size as an Indicator of Future Reuse Gennady Pekhimenko, Tyler Huberty, Rui Cai, Onur Mutlu, Todd C. Mowry Phillip B. Gibbons, Michael A. Kozuch Execu've Summary In a compressed
More informationECE 252 / CPS 220 Advanced Computer Architecture I. Administrivia. Instructors. Where to Get Answers
ECE 252 / CPS 220 Advanced Computer Architecture I Fall 2007 Duke University Prof. Daniel Sorin (sorin@ee.duke.edu) Administrivia addresses, email, website, etc. list of topics expected background course
More informationFloating-point Precision vs Performance Trade-offs
Floating-point Precision vs Performance Trade-offs Wei-Fan Chiang School of Computing, University of Utah 1/31 Problem Statement Accelerating computations using graphical processing units has made significant
More information2 Computation with Floating-Point Numbers
2 Computation with Floating-Point Numbers 2.1 Floating-Point Representation The notion of real numbers in mathematics is convenient for hand computations and formula manipulations. However, real numbers
More informationSiNUCA: A Validated Micro-Architecture Simulator
SiNUCA: A Validated Micro-Architecture ulator Marco A. Z. Alves, Matthias Diener, Francis B. Moreira, Philippe O. A. Navaux Informatics Institute Federal University of Rio Grande do Sul Email: {mazalves,
More informationCSI33 Data Structures
Outline Department of Mathematics and Computer Science Bronx Community College September 6, 2017 Outline Outline 1 Chapter 2: Data Abstraction Outline Chapter 2: Data Abstraction 1 Chapter 2: Data Abstraction
More informationAnalysis of Program Based on Function Block
Analysis of Program Based on Function Block Wu Weifeng China National Digital Switching System Engineering & Technological Research Center Zhengzhou, China beewwf@sohu.com Abstract-Basic block in program
More informationComputing System Fundamentals/Trends + Review of Performance Evaluation and ISA Design
Computing System Fundamentals/Trends + Review of Performance Evaluation and ISA Design Computing Element Choices: Computing Element Programmability Spatial vs. Temporal Computing Main Processor Types/Applications
More informationDecoupled Dynamic Cache Segmentation
Appears in Proceedings of the 8th International Symposium on High Performance Computer Architecture (HPCA-8), February, 202. Decoupled Dynamic Cache Segmentation Samira M. Khan, Zhe Wang and Daniel A.
More informationUmbra: Efficient and Scalable Memory Shadowing
Umbra: Efficient and Scalable Memory Shadowing Qin Zhao CSAIL Massachusetts Institute of Technology Cambridge, MA, USA qin zhao@csail.mit.edu Derek Bruening VMware, Inc. bruening@vmware.com Saman Amarasinghe
More informationCS 261 Fall Floating-Point Numbers. Mike Lam, Professor.
CS 261 Fall 2018 Mike Lam, Professor https://xkcd.com/217/ Floating-Point Numbers Floating-point Topics Binary fractions Floating-point representation Conversions and rounding error Binary fractions Now
More informationDynamic Floating-Point Cancellation Detection
Dynamic Floating-Point Cancellation Detection Michael O. Lam Department of Computer Science University of Maryland, College Park Email: lam@cs.umd.edu April 20, 2010 Abstract Floating-point rounding error
More informationFloating Point Representation in Computers
Floating Point Representation in Computers Floating Point Numbers - What are they? Floating Point Representation Floating Point Operations Where Things can go wrong What are Floating Point Numbers? Any
More information