Floating-point numbers. Phys 420/580 Lecture 6

Similar documents
2 Computation with Floating-Point Numbers

2 Computation with Floating-Point Numbers

Floating-point representation

Numerical computing. How computers store real numbers and the problems that result

Floating-Point Numbers in Digital Computers

Floating-Point Numbers in Digital Computers

CSCI 402: Computer Architectures. Arithmetic for Computers (3) Fengguang Song Department of Computer & Information Science IUPUI.

Floating point. Today! IEEE Floating Point Standard! Rounding! Floating Point Operations! Mathematical properties. Next time. !

Floating Point Arithmetic

COMP2611: Computer Organization. Data Representation

Data Representation Floating Point

MAT128A: Numerical Analysis Lecture Two: Finite Precision Arithmetic

Foundations of Computer Systems

Data Representation Floating Point

Floating Point January 24, 2008

Finite arithmetic and error analysis

CS321 Introduction To Numerical Methods

Scientific Computing: An Introductory Survey

Numeric Encodings Prof. James L. Frankel Harvard University

Numerical Computing: An Introduction

Scientific Computing. Error Analysis

Binary floating point encodings

Floating Point Puzzles. Lecture 3B Floating Point. IEEE Floating Point. Fractional Binary Numbers. Topics. IEEE Standard 754

Floating Point (with contributions from Dr. Bin Ren, William & Mary Computer Science)

Floating Point Puzzles. Lecture 3B Floating Point. IEEE Floating Point. Fractional Binary Numbers. Topics. IEEE Standard 754

Floating Point Numbers

ECE232: Hardware Organization and Design

Systems I. Floating Point. Topics IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties

Floating-point Arithmetic. where you sum up the integer to the left of the decimal point and the fraction to the right.

System Programming CISC 360. Floating Point September 16, 2008

Floating point. Today. IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties Next time.

Math 340 Fall 2014, Victor Matveev. Binary system, round-off errors, loss of significance, and double precision accuracy.

1.2 Round-off Errors and Computer Arithmetic

Computational Economics and Finance

Review Questions 26 CHAPTER 1. SCIENTIFIC COMPUTING

Chapter 2 Float Point Arithmetic. Real Numbers in Decimal Notation. Real Numbers in Decimal Notation

Computer arithmetics: integers, binary floating-point, and decimal floating-point

Floating-Point Data Representation and Manipulation 198:231 Introduction to Computer Organization Lecture 3

Giving credit where credit is due

2.1.1 Fixed-Point (or Integer) Arithmetic

Giving credit where credit is due

Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition. Carnegie Mellon

Floating Point Representation in Computers

Number Systems Standard positional representation of numbers: An unsigned number with whole and fraction portions is represented as:

Computer Organization: A Programmer's Perspective

Chapter Three. Arithmetic

Mathematical preliminaries and error analysis

Up next. Midterm. Today s lecture. To follow

Floating Point Arithmetic

What Every Programmer Should Know About Floating-Point Arithmetic

Floating Point Numbers

Floating Point Numbers

CS429: Computer Organization and Architecture

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

Floating Point Numbers

Floating Point : Introduction to Computer Systems 4 th Lecture, May 25, Instructor: Brian Railing. Carnegie Mellon

Today: Floating Point. Floating Point. Fractional Binary Numbers. Fractional binary numbers. bi bi 1 b2 b1 b0 b 1 b 2 b 3 b j

Roundoff Errors and Computer Arithmetic

Number Systems. Both numbers are positive

Representing and Manipulating Floating Points. Jo, Heeseung

Classes of Real Numbers 1/2. The Real Line

15213 Recitation 2: Floating Point

Computational Economics and Finance

Floating Point Puzzles The course that gives CMU its Zip! Floating Point Jan 22, IEEE Floating Point. Fractional Binary Numbers.

Data Representation Floating Point

Representing and Manipulating Floating Points. Computer Systems Laboratory Sungkyunkwan University

CS 261 Fall Floating-Point Numbers. Mike Lam, Professor.

Floating Point Representation. CS Summer 2008 Jonathan Kaldor

C++ Programming: From Problem Analysis to Program Design, Third Edition

Floating-point representations

Floating-point representations

Computational Mathematics: Models, Methods and Analysis. Zhilin Li

Representing and Manipulating Floating Points

Representing and Manipulating Floating Points

CS 261 Fall Floating-Point Numbers. Mike Lam, Professor.

Computer Architecture Chapter 3. Fall 2005 Department of Computer Science Kent State University

Floating-Point Arithmetic

AMTH142 Lecture 10. Scilab Graphs Floating Point Arithmetic

Scientific Computing: An Introductory Survey

Class 5. Data Representation and Introduction to Visualization

Computational Methods. Sources of Errors

Description Hex M E V smallest value > largest denormalized negative infinity number with hex representation 3BB0 ---

Floating Point. CSE 351 Autumn Instructor: Justin Hsia

Real Numbers finite subset real numbers floating point numbers Scientific Notation fixed point numbers

Introduction to Numerical Computing

CO212 Lecture 10: Arithmetic & Logical Unit

Most nonzero floating-point numbers are normalized. This means they can be expressed as. x = ±(1 + f) 2 e. 0 f < 1

Representing and Manipulating Floating Points

Computer arithmetics: integers, binary floating-point, and decimal floating-point

The course that gives CMU its Zip! Floating Point Arithmetic Feb 17, 2000

Floating Point. CSE 238/2038/2138: Systems Programming. Instructor: Fatma CORUT ERGİN. Slides adapted from Bryant & O Hallaron s slides

Floating-Point Arithmetic

CS 33. Data Representation (Part 3) CS33 Intro to Computer Systems VIII 1 Copyright 2018 Thomas W. Doeppner. All rights reserved.

Data Representation and Introduction to Visualization

C NUMERIC FORMATS. Overview. IEEE Single-Precision Floating-point Data Format. Figure C-0. Table C-0. Listing C-0.

Lecture Notes to Accompany. Scientific Computing An Introductory Survey. What is scientific computing?

What we need to know about error: Class Outline. Computational Methods CMSC/AMSC/MAPL 460. Errors in data and computation

Computational Methods CMSC/AMSC/MAPL 460. Representing numbers in floating point and associated issues. Ramani Duraiswami, Dept. of Computer Science

Independent Representation

Floating Point Arithmetic

Transcription:

Floating-point numbers Phys 420/580 Lecture 6

Random walk CA Activate a single cell at site i = 0 For all subsequent times steps, let the active site wander to i := i ± 1 with equal probability

Random walk CA Q: If we run this model M times, how often is the activated cell found at position i after 25, 100, 400, 1600, 4800 steps? Empirical test: let s allocate storage for a histogram: unsigned long int hist25[51]; unsigned long int hist100[201]; unsigned long int hist400[801]; unsigned long int hist1600[3201]; unsigned long int hist4800[9601]; N steps 2N+1 elements

Random walk CA Then accumulate values in the arrays: int main() { for (unsigned long int m = 0; m < M; ++m) watch the { int x = 0; offset! for (int n = 0; n <= 4800; ++n) { if (R() < 0.5) ++x; else --x; if (n == 24 or n == 25) ++hist25[x+25]; else if (n == 99 or n == 100) ++hist100[x+100]; else if (n == 399 or n == 400) ++hist400[x+400]; else if (n == 1599 or n == 1600) ++hist1600[x+1600]; else if (n == 4799 or n == 4800) ++hist4800[x+4800]; } }

Position histograms 80000 N = 25. N = 4800 0 100 50 0 50 100 Results for M = 500000

Asymptotic distribution after rescaling, collapses onto N 2M e x2 /2σ πσ 1 N

Asymptotic distribution In the double limit N,M, the rescaled histogram is a perfect gaussian (normal distribution) Amazingly, a smooth, continuous distribution can result from a limiting sequence of discrete histograms Analogue of coarse-graining Rescaling implicitly turns integers into fractions; suggests that we can use rational numbers to cover the real line

Floating-point numbers Floating-point numbers have the form sign ± f b e exponent fraction (or significand) base The adjustable radix point allows for calculation over a wide range of magnitudes Floating-point numbers are limited by the number of bits used to represent the fraction and exponent

Floating-point numbers Real line is dense and uncountably infinite: x 1 x 2 R [x 1,x 2 ] R NaN FP scheme gives a partial covering: +0 2 1 2 2 2 127 NaN 0 inf inf

Floating-point numbers Finite representation that manages to span many orders of magnitude A sort of finite-precision scientific notation, with the significant and exponent encoded in fixed width binary Equal number of uniformly spaces values in each interval [2 n,2 n+1 ) Relies on special values (+0, 0, inf, inf, NaN)

Floating point types Intel architecture follows the IEEE 754 standard sign bit float 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8-bit exponent field 23-bit fraction field sign bit double 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0... 0 11-bit exponent field 52-bit fraction field

Floating point types Representation of unity: sign bit float 0 0 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 offset by 2 8 1 leading order 1 is hidden sign bit double 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0... 0 offset by 2 11 1

Floating point types Largest positive number: sign bit float 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 all on state is reserved leading order 1 is hidden sign bit double 0 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 111 1 1 1 1 1... 1 all on state is reserved

Floating point types Smallest positive non-denormalized number: sign bit float 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 all off state is reserved leading order 1 is hidden sign bit double 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0... 0 all off state is reserved

Floating point types Largest positive denormalized number: sign bit float 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 111 1 1 1 1 1 1 1 1 1 1 1 1 1 1 denormalized leading order 0 is hidden sign bit double 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 111 1 1 1 1 1 1 1 111 111 111 1... 1 denormalized

Floating point types Negative zero: sign bit float 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 denormalized leading order 0 is hidden sign bit double 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0... 0 denormalized

Accuracy of FP arithmetic FP arithmetic is by its nature inexact Important always to think about accuracy: should we believe the computer s final answer? FP multiplication is relatively safe FP subtraction of nearly-equal quantities (or addition of equal magnitude, opposite sign quantities) can dramatically increase the relative error

Potential dangers FP operations can yield both overflow and underflow Additional notes on the class web site will explore the Infinity (Inf) and Not-a-Number (NaN) error states

Potential dangers Associativity breaks down: (u + v)+w = u +(v + w) The following 8-digit decimal floating point operation has a 5% relative error depending on the order in which operations are performed: (11111113. + 11111111.)+ 7.5111111 = 2.0000000 + 7.511111 = 9.511111 11111113. +( 11111111. + 7.511111)= 11111113. + 11111103. = 10.000000

Potential dangers The distributive law u (v + w) = (u v)+(u w) can also fail badly: 20000.000 ( 6.0000000 + 6.0000003)= 20000.000 0.00000030000000 = 0.0060000000 (20000.000 6.0000000)+(20000.000 6.0000003)= 120000.00 + 120000.01 =.01000000

Potential dangers It can easily occur that 2(u 2 + v 2 ) < (u + v) 2 Hence, variance is not guaranteed to be positive Naively calculating the standard deviation can lead to your taking the square root of a negative number σ = 1 n n n 2 n xk 2 x k k=1 k=1

Potential dangers Many common mathematical relations no longer hold... (x + y)(x y)=x 2 y 2 sin 2 θ + cos 2 θ = 1

Carefully written programs technical meaning: programs that are numerically correct this is very difficult to guarantee!

Carefully written programs 1. (x + y)/2 2. x/2 + y/2 3. x +((y x)/2) Which formula should we use to compute the average of x and y? 4. y +((x y)/2)

Carefully written programs 1. (x + y)/2 2. x/2 + y/2 May raise an overflow if x and y have the same sign 3. x +((y x)/2) 4. y +((x y)/2)

Carefully written programs 1. (x + y)/2 2. x/2 + y/2 3. x +((y x)/2) May degrade accuracy but is safe from overflows 4. y +((x y)/2)

Carefully written programs 1. (x + y)/2 2. x/2 + y/2 3. x +((y x)/2) 4. y +((x y)/2) May raise an overflow if x and y have opposite signs

avoid undefined operations, e.g., 1, 0 0 Carefully written programs you want functions that are robust give some thought to the rare or extreme cases that may cause your function to misbehave avoid overflows and underflows

Carefully written programs Example: roots of the quadratic equation, ax 2 + bx + c According to the usual formula, Problem can arise if x 1,2 = b ± b 2 4ac 2a b 2 4ac so that b 2 4ac b Cancellation can lead to catastrophic loss of significant digits: b ± b 2a 0 0

Carefully written programs One possible workaround: use exact algebraic manipulations on a per case basis x 1 = b + b 2 4ac 2a x 2 = b b 2 4ac 2a = = 2c b + b 2 4ac 2c b + b 2 4ac cancellation no cancellation

Carefully written programs #include <cassert> #include <cmath> using std::sqrt; // square root using std::fabs; // absolute value void quadratic_roots(double a, double b, double c, double &x1, double &x2) { const double X2 = b*b-4*a*c; assert(x2 >= 0.0); const double X = sqrt(x2); const double Ym = -b-x; const double Yp = -b+x; const double Y = (fabs(ym) > fabs(yp)? Ym : Yp); } x1 = 2*c/Y; x2 = Y/(2*a);

Carefully written programs Example: norm of a complex number z = x + iy, z = x 2 + y 2 Avoid possible overflow when squaring terms: z = x 1 + r 2, r = y, if y < x x z = y 1 + r 2, r = x, if x < y y

Carefully written programs Evaluation by nested polynomials (Horner s scheme) f (x)=a 0 + a 1 x + a 2 x 2 + a 3 x 3 + a N x N = a 0 + x(a 1 + x(a 2 + x(a 3 + x(a N 1 + xa N ) ))) double eval_poly(const double f[], double x, int n) { double val = f[--n]; do { val *= x; val += f[--n]; } while (n!= 0); return val; }

Carefully written programs #include <iostream> using std::cout; using std::endl; int main() { // f(x) = 1 + 20x + 9x^2-3x^3 // + 5x^4 + 2x^5 + x^6 double f[7] = { 1, 20, 9, -3, 5, 2, 1 }; for (int i = 0; i <= 1000; ++i) { const double x = (i-500)*3.0/500; cout << x << "\t" << eval_poly(f,x,7) << endl; } return 0; }

Carefully written programs Example: the sinc function possible problems as x 0 sin(x) x workaround: explicit power series expansion sin(x) x = x x3 3! + x5 5! + x = 1 x2 3! + x4 5! + with sensible cutoff

Arbitrary precision arithmetic scheme for performing operations on integers and rational numbers with no rounding, e.g., 2153 9932 + 871 7362 = 12250579 36559692 available in symbolic manipulation environments (Maple, Mathematica) and bignum libraries implemented in software; limited by system memory