Numerical Methods in Physics. Lecture 1 Intro & IEEE Variable Types and Arithmetic

Variable types Numerical Methods in Physics Lecture 1 Intro & IEEE Variable Types and Arithmetic Pat Scott Department of Physics, Imperial College November 1, 2016 Slides available from http://astro.ic.ac.uk/pscott/ course-webpage-numerical-methods-201617

Housekeeping Variable types You should all have copies of the course info sheet. Some things to note: Course webpage: http://astro.ic.ac.uk/pscott/ course-webpage-numerical-methods-201617 Format: lectures, tutorials + group presentation sessions Lectures Tuesdays 2pm, Tutorials Fri 11am 4 lectures + tutorials in Term 1 5 lectures + tutorials in Term 2 4 2hr group sessions after all lectures Textbook: Numerical Recipes Programming & Languages

Housekeeping Variable types You should all have copies of the course info sheet. Some things to note: Assessment 75% assignments (9; weekly) 25% group project: 3 7% group marks for {presentation, code, report} + 4% for participating in others talks no exam Assignment collaboration/copying policy Group project: pairs, 45+15 presentation to the group Choose an advanced topic, let me know what + who your group is ASAP

Outline Variable types 1 Variable types 2 3

Variable types logicals/booleans Single bit 0/1 = true/false integers multiple logicals, each corresponding to a power of 2 i.e. just natural numbers base 2 unsigned or signed (+1 sign bit) 0 0 1 1 0 1 0 0 + 0 3216 0 4 0 0 = 52 fixed point short (15+1-bit binary strings) or long (31+1)

Variable types floating point inexact representation of R sign bit s, mantissa/significand M and exponent e number = ( 1) s M B e E (given base B and offset E) For this example B = 2, E = 151 single (1 + 23 + 8 = 32-bit), double (1 + 52 + 11 = 64-bit), extended/quadruple (1 + 112 + 15 = 128-bit) complex composite of two floating-point numbers plus some interaction terms The bible: What Every Computer Scientist Should Know About Floating-Point Arithmetic Goldberg (1991) (available here)

Variable types strings/characters cat, dog, TSM DTC, etc. each character is encoded by some set number of bits In standard ASCII, there are 7 bits encoding 2 7 = 128 characters (not all printable) a = 97 (dec) = 61 (Hex) = 0110 0001 (bin) pointers Integer memory addresses saved as variables used to refer to other variables, functions, subroutines, pointers, objects, etc references (C++) Like pointers, but more of an alias than a variable arrays, structures, classes, objects, etc Just bundles of other variables, often with additional operations defined on them

Limitations Variable types Overflows Above the representable range for number Integers tend to wrap around (or throw errors) Floats go to ±Inf Underflows Only an issue for floats number ±0 NaNs Not a Number Only really exists in floating-point specification Most often (at least in my lousy code) results from 1 a divide by 0 2 operations on ±Inf 3 imaginary results for real functions

Outline Variable types 1 Variable types 2 3

Variable types Floating-point Addition and Subtraction A + B (when A > B) 1 shift B to the same exponent as A 2 integer add/subtract the mantissae B = 2, E = 151 again Two numbers are very different in size = the smaller is ignored There exists a smallest number that can be added/subtracted meaningfully from 1.0 machine accuracy Two numbers very similar size = subtraction causes cancellation errors Significant digits are lost, only those affected by round-off remain

Variable types Floating-point Multiplication & Division A B 1 XOR (multiply) the sign bit 2 integer add/subtract the exponents 3 integer multiply/divide the mantissae Flush to zero 0 β e min β e + 1 min β e + 2 min β e + 3 min Gradual underflow 0 β e min β e + 1 min β e + 2 min β e + 3 min Going < smallest normalisable number in the rep. causes headaches subnormal/denormalised numbers are inaccurate and slow Rounding/representation error multiplies as floats are operated on always a problem, but magnified with denormalised numbers

Variable types on integers Integer operations and arithmetic are fast and exact Well, except for division in this case the remainder is truncated there are other tricks to get quick rounded results Standard logical operations can be performed bitwise on whole integers at once This is good for: quick multiplication or division by 2 extracting/setting combinations of flags (booleans)

Outline Variable types 1 Variable types 2 3

Typing Variable types Languages can be statically typed C/C++, Fortran, many others = compatibility of variable types for each expression is checked at compile-time e.g. int mynum = 3; std::string bird = "duck"; mynum = 5; //OK! mynum = "5"; //nope. bird = mynum; //err, no. or dynamically typed e.g. Perl, python compatibility of types checked only at runtime (or not at all) = type conversion happens automatically sometimes convenient, usually just dangerous (in my opinion) Even statically-typed languages can have implicit typing The golden rule: always declare variable types explicitly in your numerical codes!!!

Default widths Variable types Most new systems are 64-bit Most scientific code was written on 32-bit machines Variable widths can differ! Classic nasty bug: passing pointers between C and Fortran C pointer has same width as long int (32 bits) on 32-bit, but twice the width on 64-bit Fortran has no separate pointer type (attribute only) Linked C-F90 code compiles fine on both machines Crashes with seg fault on 64-bit machine, runs OK on 32-bit = not enough space reserved for 64-bit pointer in C-F90 interface

Variable types Comparison using floating-point variables Never, ever write if (myvar == 0.) or if (myvar1 == myvar2) Floating point numbers cannot be trusted to be exactly equal to anything!!! Much safer to write if (std::abs(myvar)< absprec) or if (std::abs(myvar1/myvar2-1.0)< relprec) where absprec = Aɛ and relprec = Bɛ Floating point comparisons only OK to within some suitable multiple A, B >> 1 of ɛ!!! In practice A and B are set per-algorithm (according to round-off error, speed, truncation error, etc)

Basic optimisation Variable types Write things in terms of + and, minimise / if possible i.e. (x+x+x) / (y*z) is faster than 3*x / y / z Multiplication is faster than exponentiation i.e. (x*x) is faster than x^2 at some stage the extra operations add up though; (x*x*x*x*x) is probably slower than x^5 Define any simple function inline if possible There are lots of other tricks modern compilers will do most of these things for you automatically, but you can usually help them out by thinking about minimising number and complexity of operations

Variable types Renormalisation of variables Avoid working with very big floats Speed - most CPUs are quicker doing double prec than extended prec Space - doubles take less space than extendeds (although nobody seems to care nowadays) Accuracy - it is easy to go beyond the max/min representable number Also avoid working with very small floats Speed - denormalised arithmetic is slooooow Space - as with big floats: prefer lower precision Accuracy - denormalised arithmetic is inaccurate Instead, choose a scale and renormalise A B A/A max Easier + faster to multiply by a scalefactor afterwards, even if that requires type conversion

Variable types Renormalisation of variables Consider working in log space if your data span a huge range of scales, and you care about the small ones e.g. a range of 2 70 is way easier to work with than 100 10 70 often a lot faster and more accurate; no huge or tiny floats Easier + faster to exponentiate afterwards, even if that requires type conversion

Variable types OK, that s it for today. Assignment 1 is available on the homepage Covers: Floating point issues Due in: Monday Nov 7 (via your BitBucket git repo) Next meeting: Tutorial, 11am Friday Nov 4 (here) Next lecture: Tuesday Nov 8 Interpolation