Matthieu Lefebvre Princeton University. Monday, 10 July First Computational and Data Science school for HEP (CoDaS-HEP)

Similar documents
Scientific Computing. Error Analysis

Floating Point Representation in Computers

Numerical computing. How computers store real numbers and the problems that result

CS 261 Fall Floating-Point Numbers. Mike Lam, Professor.

Floating point. Today! IEEE Floating Point Standard! Rounding! Floating Point Operations! Mathematical properties. Next time. !

CS 261 Fall Floating-Point Numbers. Mike Lam, Professor.

CS429: Computer Organization and Architecture

Floating-Point Arithmetic

18-642: Floating Point Blues

Floating Point January 24, 2008

Floating Point Arithmetic

Systems I. Floating Point. Topics IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties

Floating Point Numbers

Floating Point Numbers

Floating-Point Data Representation and Manipulation 198:231 Introduction to Computer Organization Lecture 3

Floating Point Puzzles. Lecture 3B Floating Point. IEEE Floating Point. Fractional Binary Numbers. Topics. IEEE Standard 754

Floating Point Puzzles The course that gives CMU its Zip! Floating Point Jan 22, IEEE Floating Point. Fractional Binary Numbers.

Floating Point Numbers. Lecture 9 CAP

Floating-Point Arithmetic

Chapter 2 Float Point Arithmetic. Real Numbers in Decimal Notation. Real Numbers in Decimal Notation

Floating Point (with contributions from Dr. Bin Ren, William & Mary Computer Science)

Data Representation Floating Point

Floating Point Numbers

Floating-Point Arithmetic

Floating Point. CSC207 Fall 2017

Foundations of Computer Systems

Data Representation Floating Point

Lecture 13: (Integer Multiplication and Division) FLOATING POINT NUMBERS

CS 6210 Fall 2016 Bei Wang. Lecture 4 Floating Point Systems Continued

Computer Architecture Chapter 3. Fall 2005 Department of Computer Science Kent State University

3.5 Floating Point: Overview

COMP2611: Computer Organization. Data Representation

Giving credit where credit is due

Divide: Paper & Pencil

Giving credit where credit is due

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

Floating-point numbers. Phys 420/580 Lecture 6

Floating-point Arithmetic. where you sum up the integer to the left of the decimal point and the fraction to the right.

System Programming CISC 360. Floating Point September 16, 2008

Arithmetic for Computers. Hwansoo Han

Floating Point Puzzles. Lecture 3B Floating Point. IEEE Floating Point. Fractional Binary Numbers. Topics. IEEE Standard 754

Computer Systems C S Cynthia Lee

Floating point. Today. IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties Next time.

Representing and Manipulating Floating Points. Jo, Heeseung

REMEMBER TO REGISTER FOR THE EXAM.

GPU Floating Point Features

Floating Point. CSE 351 Autumn Instructor: Justin Hsia

NAN propagation versus fault trapping in floating point code

CO212 Lecture 10: Arithmetic & Logical Unit

Floating Point. CSE 351 Autumn Instructor: Justin Hsia

Module 12 Floating-Point Considerations

Floating Point : Introduction to Computer Systems 4 th Lecture, May 25, Instructor: Brian Railing. Carnegie Mellon

EE 109 Unit 19. IEEE 754 Floating Point Representation Floating Point Arithmetic

Module 12 Floating-Point Considerations

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition. Carnegie Mellon

On a 64-bit CPU. Size/Range vary by CPU model and Word size.

ECE232: Hardware Organization and Design

Representing and Manipulating Floating Points

Finite arithmetic and error analysis

Classes of Real Numbers 1/2. The Real Line

Written Homework 3. Floating-Point Example (1/2)

Floating Point. The World is Not Just Integers. Programming languages support numbers with fraction

FP_IEEE_DENORM_GET_ Procedure

IEEE-754 floating-point

15213 Recitation 2: Floating Point

Floating-Point Numbers in Digital Computers

Floating-Point Numbers in Digital Computers

By, Ajinkya Karande Adarsh Yoga

Floating Point Arithmetic

Signed Multiplication Multiply the positives Negate result if signs of operand are different

Scientific Computing: An Introductory Survey

Computer Architecture, Lecture 14: Does your computer know how to add?

Floating Point. CSE 351 Autumn Instructor: Justin Hsia

2 Computation with Floating-Point Numbers

Floating Point Representation. CS Summer 2008 Jonathan Kaldor

Number Systems CHAPTER Positional Number Systems

Floating Point Numbers. Luca Heltai

Up next. Midterm. Today s lecture. To follow

Number Systems and Computer Arithmetic

ecture 25 Floating Point Friedland and Weaver Computer Science 61C Spring 2017 March 17th, 2017

Representing and Manipulating Floating Points

CS321 Introduction To Numerical Methods

Representing and Manipulating Floating Points. Computer Systems Laboratory Sungkyunkwan University

Computational Methods. Sources of Errors

Floating-point representation

Catastrophic cancellation: the pitfalls of floating point arithmetic (and how to avoid them!) Graham Markall Quantitative Developer, OpenGamma

Floating Point. CSE 238/2038/2138: Systems Programming. Instructor: Fatma CORUT ERGİN. Slides adapted from Bryant & O Hallaron s slides

CSCI 402: Computer Architectures. Arithmetic for Computers (3) Fengguang Song Department of Computer & Information Science IUPUI.

Math 340 Fall 2014, Victor Matveev. Binary system, round-off errors, loss of significance, and double precision accuracy.

Outline. What is Performance? Restating Performance Equation Time = Seconds. CPU Performance Factors

CS 61C: Great Ideas in Computer Architecture Performance and Floating-Point Arithmetic

Objectives. look at floating point representation in its basic form expose errors of a different form: rounding error highlight IEEE-754 standard

2 Computation with Floating-Point Numbers

Most nonzero floating-point numbers are normalized. This means they can be expressed as. x = ±(1 + f) 2 e. 0 f < 1

Floating Point Arithmetic. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Computer Arithmetic Floating Point

Chapter 3: Arithmetic for Computers

Floating Point Numbers

Basics of Computation. PHY 604:Computational Methods in Physics and Astrophysics II

Transcription:

Matthieu Lefebvre Princeton University Monday, 10 July 2017 First Computational and Data Science school for HEP (CoDaS-HEP)

Prerequisites: recent C++ compiler Eventually cmake git clone https://github.com/mpbl/codas_fpa/ README.md for instructions CoDaS-HEP 7/10/17 2

Disasters Reminder on integers Floating point numbers IEEE 754 Rounding modes, exceptions, underflow, Improving FPA accuracy Kahan algorithm, FMA Computing Faster Fast Math, reduced precision, mixed precision Concurrency Conclusion References CoDaS-HEP 7/10/17 3

Patriot Missile Failure Rounding errors 1991, Gulf War. Failed to track and intercept an incoming Iraqi Scud missile. Inaccurate calculation of the time since boot due to computer arithmetic errors Explosion of the Ariane 5 Overflow 1996, Kourou, French Guiana software error in the inertial reference system Storing 64 bits FP into 16 bits integers CoDaS-HEP 7/10/17 4 http://www-users.math.umn.edu/~arnold/disasters/

template <typename Integer> void world_population() { // https://en.wikipedia.org/wiki/list_of_continents_by_population // in 2010 std::cout << "Sizeof(Integer) : " << sizeof(integer) << std::endl; } std::map<std::string, Integer> continents = { {"africa", 1'044'107'001}, {"americas", 943'952'001}, {"asia", 4'169'860'001}, {"europe", 735'395'001}, {"oceania", 36'411'001}}; Integer total = 0; for (auto &continent : continents) { total += continent.second; } std::cout << "Total world population : " << total << std::endl; world_population<int32_t>(); 01_integer/integer_overflow How many people, worldwide? Does it makes sense? What might be the problem? Why data are from 2010 and not 2016? CoDaS-HEP 7/10/17 5

int is not integer Representation uses a limited number of bits Positive numbers are just represented using their binary form Negative numbers often use two's complement Properties of arithmetic types can be queried using std::numeric_limits (C++) On my machine (and probably on yours) 2 32 < int = int32_t <= 2 31-1 // 1 bit is used to store the sign 0 < unsigned int = int32_t <= 2 32 CoDaS-HEP 7/10/17 6

Representing numbers that would be too large or too small to be represented as integers 1.4e 45 to 3.4e38 Representing numbers that are not representable as integers Of course, floating points representations are also subject to use only a limited number of bits. CoDaS-HEP 7/10/17 7

Speed Accuracy: Correct results Range: Large and small numbers Portability: Run on different machines, giving the same answer Ease of implementation and use Needs to feel natural, at least to the user 7/10/17 CoDaS-HEP 8 Handbook of Floating Point Arithmetic

A number is represented exactly by: Significand base exponent By instance: 3.1415 = 31415 10-4 Stored in memory using a limited number of bits: 8 bits 16 bits (half precision) 32 bits (single precision) Significand: Mantissa Coefficient Base: Radix 64 bits (double precision) CoDaS-HEP 7/10/17 9

Exponent Fraction == 0 Fraction!= 0 Equation All Zeros 0, -0 Subnormal value (Fraction starts with an implicit 0) All Ones ± NaN Otherwise Normalized value (Fraction starts with an implicit 1) (-1) sign 2-126 0.fraction (-1) sign 2exponent 127 1.fraction CoDaS-HEP 7/10/17 10 https://en.wikipedia.org/wiki/single-precision_floating-point_format

02_fpa/fpa_roundings Roundings to nearest Round to nearest, ties to even rounds to the nearest value; if the number falls midway it is rounded to the nearest value with an even (zero) least significant bit; this is the default for binary floating-point and the recommended default for decimal. Round to nearest, ties away from zero rounds to the nearest value; if the number falls midway it is rounded to the nearest value above (for positive numbers) or below (for negative numbers); this is intended as an option for decimal floating point. Directed roundings Round toward 0 directed rounding towards zero (also known as truncation). Round toward + directed rounding towards positive infinity (also known as rounding up or ceiling). Round toward directed rounding towards negative infinity (also known as rounding down or floor). CoDaS-HEP 7/10/17 11

The IEEE standard defines several FP exceptions Can be ignored Default action is taken Can be trapped Error is signaled Underflow: Too small to be represented as a normalized float in its format. If ignored, the operation results in a denormalized float or zero. Overflow: Too large to be represented as a float in its format. If ignored, the operation results in the appropriate infinity. Divide-by-zero: Float is divided by zero. If ignored, the appropriate infinity is returned. Invalid: Ill-defined operation, such as (0.0/ 0.0). If ignored, a quiet NaN is returned. Inexact: The result of a floating point operation is not exact, i.e. the result was rounded. If ignored, the rounded result is returned CoDaS-HEP 7/10/17 12

Subnormals (or denormals) are FP smaller than the smallest normalized FP: they have leading zeros in the significand For single precision they represent the range 10-38 to 10-45 Subnormals guarantee that additions never underflow Any other operation producing a subnormal will raise a underflow exception if also inexact Hardware is not always able to deal with subnormals Software assist is required: SLOW To get correct results even the software algorithms need to be specialized It is possible to tell the hardware to flush-to-zero (ftz) subnormals It will raise underflow and inexact exceptions CoDaS-HEP 7/10/17 13

function KahanSum(input) var sum = 0.0 var c = 0.0 // A running compensation for lost low-order bits. for i = 1 to input.length do var y = input[i] - c // So far, so good: c is zero. // Alas, sum is big, y small, so low-order digits of y are lost. var t = sum + y // (t - sum) cancels the high-order part of y; // subtracting y recovers negative (low part of y) // Algebraically, c should always be zero. // Beware overly-aggressive optimizing compilers! c = (t - sum) - y sum = t return sum https://en.wikipedia.org/wiki/kahan_summation_algorithm patriot/patriot.cpp (V. Innnocente) CoDaS-HEP 7/10/17 14

Kahan Summation Algorithm does not work for ill-conditioned sums In particular in an element is larger than the sum Other summation algorithms Fast2Sum (Dekker), 2Sum (Knuth et al.), Products also have specific algorithms for accurate computations: Dekker, Handbook of Floating-Point Arithmetic CoDaS-HEP 7/10/17 15

Or Fused Multiply-Add (FMA) : a b + c Multiplier Accumulator (MAC) hardware unit Performed with a single rounding ( IEEE 754-2008) (instead of 2 for one multiplication followed by an addition) A fast FMA can speed up and improve the accuracy of many computations that involve the accumulation of products: Dot product Matrix multiplication Polynomial evaluation (e.g., with Horner's rule) Newton's method for evaluating functions. Convolutions and artificial neural networks 7/10/17 CoDaS-HEP 16 https://en.wikipedia.org/wiki/multiply%e2%80%93accumulate_operation

Inverse analysis based on the Wilkinson principle : the computed solution is assumed to be the exact solution of a nearby problem provides error bounds for the computed results Interval arithmetic The result of an operation between two intervals contains all values that can be obtained by performing this operation on elements from each interval. guaranteed bounds for each computed result the error may be overestimated specific algorithms Probabilistic approach uses a random rounding mode estimates the number of exact significant digits of any computed result 7/10/17 CoDaS-HEP 17 http://www.math.twcu.ac.jp/~conf/fjwnc2015/doc/jezequel.pdf

Operator Instruction AVX FP32 AVX FP64 +, - ADD, SUB 3 3 ==,!= COMISS, CMP 2,3 2,3 cast fp32 <-> fp64 CVT 4 4, &, ^ AND, OR 1 1 * MUL 5 5 /,sqrt DIV, SQRT 21-29 21-45 1.f/, 1.f/sqrt RCP, RSQRT 7 = MOV 1,4 1,4 CoDaS-HEP 7/10/17 18

man gcc /-ffast-math -- Sets the options: -fno-math-errno Do not set "errno" after calling math functions that are executed with a single instruction -funsafe-math-optimizations : assume that arguments and results are valid. -ffinite-math-only Allow re-association of operands in series of floating-point operations. Patriot example? -fno-rounding-math Disable transformations and optimizations that assume default floating-point rounding behavior. -fno-signaling-nans Do not assuming that IEEE signaling NaNs may generate user-visible traps during floating-point operations. (default) -fcx-limited-range: range check for complex division. CoDaS-HEP 7/10/17 19

Avoid or factorize-out division and sqrt if possible compile with Ofast or -ffast-math Prefer linear algebra to trigonometric functions Cache quantities often used No free lunch: at best trading memory for cpu Choose precision to match required accuracy Square and square-root decrease precision Catastrophic precision-loss in the subtraction of almost-equal large numbers CoDaS-HEP 7/10/17 20

Getting popular for some machine learning application NVIDIA P100 can perform FP16 arithmetic at twice the throughput of FP32. Large number of parameters and the generally modest accuracy required for the final output is this image a cat? or is this a fraudulent application? Training can be successful with floating point half precision (16 bits) or with fixed point or integers (as low as 8 bits in some cases). Don t use it blindly in your codes: Check first! 7/10/17 CoDaS-HEP 21 http://www.theregister.co.uk/2016/11/10/short_wide_deep_but_not_high/

For some problem it does matter Poisson Equation Finite elements - u = f Strzodka et al. CoDaS-HEP http://www.nvidia.com/content/nvision2008/tech_presentations/ 7/10/17 22 NVIDIA_Research_Summit/NVISION08-Mixed_Precision_Methods_on_GPUs.pdf

Strzodka et al. http://www.nvidia.com/content/nvision2008/tech_presentations/ NVIDIA_Research_Summit/NVISION08-Mixed_Precision_Methods_on_GPUs.pdf Exploit the speed of low precision and obtain a result of high accuracy Compute in high precision (cheap) Solve in low precision (fast) Correct in high precision (cheap) Iterate until convergence in high precision Now also half-precision in single precision codes https://devblogs.nvidia.com/parallelforall/mixed-precision-programming-cuda- 8/ CoDaS-HEP 7/10/17 23

Concurrency makes it worse! Operations a shared variables (e.g. reduction) Concurrency implies unknown orders for operation Inherent to concurrency; does not depend on the parallel model Worst as the degree of parallelism increases For instance, on GPU codes using atomics CoDaS-HEP 7/10/17 24

Should you worry about the accuracy of every LoC you write? Study your problem /algorithm to understand what level of precision is required / acceptable Usually the answer is already known by your community Verify your results / programs Convergence tests, statistical tests, analytical solutions, Check for performance bottlenecks Other CoDaS talks CoDaS-HEP 7/10/17 25

Optimal floating point computation: Accuracy, Precision, Speed in scientific computing. Innocente. 2012 Handbook of Floating-Point Arithmetic. Mueller et al. 2010 What Every Computer Scientist Should Know About Floating-Point Arithmetic. Goldberg. https://docs.oracle.com/cd/e19957-01/806-3568/ncg_goldberg.html CoDaS-HEP 7/10/17 26