Practical Numerical Methods in Physics and Astronomy. Lecture 1 Intro & IEEE Variable Types and Arithmetic

Similar documents
Numerical Methods in Physics. Lecture 1 Intro & IEEE Variable Types and Arithmetic

Chapter 2 Float Point Arithmetic. Real Numbers in Decimal Notation. Real Numbers in Decimal Notation

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

Floating Point (with contributions from Dr. Bin Ren, William & Mary Computer Science)

CS 261 Fall Floating-Point Numbers. Mike Lam, Professor.

Floating Point. CSE 351 Autumn Instructor: Justin Hsia

Floating Point January 24, 2008

Floating Point Puzzles The course that gives CMU its Zip! Floating Point Jan 22, IEEE Floating Point. Fractional Binary Numbers.

CS 261 Fall Floating-Point Numbers. Mike Lam, Professor.

Floating Point Puzzles. Lecture 3B Floating Point. IEEE Floating Point. Fractional Binary Numbers. Topics. IEEE Standard 754

Module 2: Computer Arithmetic

Floating Point. CSE 351 Autumn Instructor: Justin Hsia

System Programming CISC 360. Floating Point September 16, 2008

Floating Point Numbers

Floating Point. CSE 351 Autumn Instructor: Justin Hsia

Floating Point Numbers

Floating Point Numbers

Floating Point Puzzles. Lecture 3B Floating Point. IEEE Floating Point. Fractional Binary Numbers. Topics. IEEE Standard 754

COMP2611: Computer Organization. Data Representation

Giving credit where credit is due

Giving credit where credit is due

Systems I. Floating Point. Topics IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties

CS321 Introduction To Numerical Methods

Topic Notes: Bits and Bytes and Numbers

Representing and Manipulating Floating Points. Computer Systems Laboratory Sungkyunkwan University

Basics of Computation. PHY 604:Computational Methods in Physics and Astrophysics II

Foundations of Computer Systems

Inf2C - Computer Systems Lecture 2 Data Representation

Floating point. Today! IEEE Floating Point Standard! Rounding! Floating Point Operations! Mathematical properties. Next time. !

Data Representation Floating Point

Data Representation Type of Data Representation Integers Bits Unsigned 2 s Comp Excess 7 Excess 8

Computer Organization: A Programmer's Perspective

Roundoff Errors and Computer Arithmetic

Floating Point Numbers

Representing and Manipulating Floating Points. Jo, Heeseung

Floating Point : Introduction to Computer Systems 4 th Lecture, May 25, Instructor: Brian Railing. Carnegie Mellon

Representing and Manipulating Floating Points

Numerical computing. How computers store real numbers and the problems that result

Representing and Manipulating Floating Points

Arithmetic and Bitwise Operations on Binary Data

Written Homework 3. Floating-Point Example (1/2)

Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition. Carnegie Mellon

Data Representation Floating Point

Computer Arithmetic Ch 8

Computer Arithmetic Ch 8

Representing and Manipulating Floating Points

Topic Notes: Bits and Bytes and Numbers

2/5/2018. Expressions are Used to Perform Calculations. ECE 220: Computer Systems & Programming. Our Class Focuses on Four Types of Operator in C

On a 64-bit CPU. Size/Range vary by CPU model and Word size.

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: Bits and Bytes and Numbers

Bits, Words, and Integers

Today: Floating Point. Floating Point. Fractional Binary Numbers. Fractional binary numbers. bi bi 1 b2 b1 b0 b 1 b 2 b 3 b j

CS429: Computer Organization and Architecture

Course Schedule. CS 221 Computer Architecture. Week 3: Plan. I. Hexadecimals and Character Representations. Hexadecimal Representation

What Every Programmer Should Know About Floating-Point Arithmetic

Objects and Types. COMS W1007 Introduction to Computer Science. Christopher Conway 29 May 2003

A Java program contains at least one class definition.

9/2/2016. Expressions are Used to Perform Calculations. ECE 120: Introduction to Computing. Five Arithmetic Operators on Numeric Types

Fixed-Point Math and Other Optimizations

1/31/2017. Expressions are Used to Perform Calculations. ECE 120: Introduction to Computing. Five Arithmetic Operators on Numeric Types

Floating Point. CSE 238/2038/2138: Systems Programming. Instructor: Fatma CORUT ERGİN. Slides adapted from Bryant & O Hallaron s slides

Characters, Strings, and Floats

COSC 243. Data Representation 3. Lecture 3 - Data Representation 3 1. COSC 243 (Computer Architecture)

CSCI 402: Computer Architectures. Arithmetic for Computers (3) Fengguang Song Department of Computer & Information Science IUPUI.

Chapter 03: Computer Arithmetic. Lesson 09: Arithmetic using floating point numbers

Divide: Paper & Pencil

Number Representations

Chapter 4. Operations on Data

Chapter 3. Errors and numerical stability

8/30/2016. In Binary, We Have A Binary Point. ECE 120: Introduction to Computing. Fixed-Point Representations Support Fractions

Floating Point Numbers. Lecture 9 CAP

Floating point. Today. IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties Next time.

Chapter 3: Arithmetic for Computers

Binary floating point encodings

CS 33. Data Representation (Part 3) CS33 Intro to Computer Systems VIII 1 Copyright 2018 Thomas W. Doeppner. All rights reserved.

Floating-Point Data Representation and Manipulation 198:231 Introduction to Computer Organization Lecture 3

Computers and programming languages introduction

Overview (4) CPE 101 mod/reusing slides from a UW course. Assignment Statement: Review. Why Study Expressions? D-1

Introduction to C. Why C? Difference between Python and C C compiler stages Basic syntax in C

Floating Point Arithmetic

Data Representation Floating Point

Finite arithmetic and error analysis

The Design and Implementation of a Rigorous. A Rigorous High Precision Floating Point Arithmetic. for Taylor Models

Lecture 13: (Integer Multiplication and Division) FLOATING POINT NUMBERS

Organisasi Sistem Komputer

Midterm 1 topics (in one slide) Bits and bitwise operations. Outline. Unsigned and signed integers. Floating point numbers. Number representation


CHAPTER 5: Representing Numerical Data

CS 101: Computer Programming and Utilization

But first, encode deck of cards. Integer Representation. Two possible representations. Two better representations WELLESLEY CS 240 9/8/15

CS367 Test 1 Review Guide

CO212 Lecture 10: Arithmetic & Logical Unit

Up next. Midterm. Today s lecture. To follow

Objectives. look at floating point representation in its basic form expose errors of a different form: rounding error highlight IEEE-754 standard

Number Systems and Computer Arithmetic

4 Operations On Data 4.1. Foundations of Computer Science Cengage Learning

Floating-Point Arithmetic

Computer Architecture and System Software Lecture 02: Overview of Computer Systems & Start of Chapter 2

Introduction to Computer Systems Recitation 2 May 29, Marjorie Carlson Aditya Gupta Shailin Desai

3.5 Floating Point: Overview

Transcription:

Practical Numerical Methods in Physics and Astronomy Lecture 1 Intro & IEEE Variable Types and Arithmetic Pat Scott Department of Physics, McGill University January 16, 2013 Slides available from http://www.physics.mcgill.ca/ patscott

Housekeeping Variable types You should all have copies of the course info sheet. Some things to note: Format: half lectures by me, half by you Monday lecture times (1pm vs 1:30) Textbook Programming & Languages Assignments: 4, mainly during the first half of the course Assignment collaboration/copying policy Presentations: how they work, scheduling (please choose topic and slot by next lecture!) Assessment: no exam (hooray)

Housekeeping Variable types You should all have copies of the course info sheet. Some things to note: Format: half lectures by me, half by you Monday lecture times (1pm vs 1:30) Textbook Programming & Languages Assignments: 4, mainly during the first half of the course Assignment collaboration/copying policy Presentations: how they work, scheduling (please choose topic and slot by next lecture!) Assessment: no exam (hooray) Ask questions! (especially stupid ones)

Outline Variable types 1 Variable types 2 3

Outline Variable types 1 Variable types 2 3

logicals/booleans Single bit 0/1 = true/false integers multiple logicals, each corresponding to a power of 2 i.e. just natural numbers base 2 unsigned or signed (+1 sign bit) 0 0 1 1 0 1 0 0 + 0 3216 0 4 0 0 = 52 fixed point short (15+1-bit binary strings) or long (31+1)

floating point inexact representation of R sign bit s, mantissa/significand M and exponent e number = ( 1) s M B e E (given base B and offset E) For this example B = 2, E = 129 single (1 + 23 + 8 = 32-bit), double (1 + 52 + 11 = 64-bit), extended/quadruple (1 + 112 + 15 = 128-bit) complex composite of two floating-point numbers plus some interaction terms The bible: What Every Computer Scientist Should Know About Floating-Point Arithmetic Goldberg (1991) (available here)

strings/characters cat, dog, PHYS606, etc. each character is encoded by some set number of bits In standard ASCII, there are 7 bits encoding 2 7 = 128 characters (not all printable) a = 97 (dec) = 61 (Hex) = 0110 0001 (bin) pointers Integer memory addresses used to refer to other variables, functions, subroutines, pointers, objects, etc arrays, structures, classes, objects, etc Just bundles of other variables, often with additional operations defined on them

Limitations Variable types Overflows Above the representable range for number Integers tend to wrap around (or throw errors) Floats go to ±Inf Underflows Only an issue for floats number ±0 NaNs Not a Number Only really exists in floating-point specification Most often (at least in my lousy code) results from 1 a divide by 0 2 operations on ±Inf 3 imaginary results for real functions

Outline Variable types 1 Variable types 2 3

Floating-point Multiplication & Division A B 1 XOR (multiply) the sign bit 2 integer add/subtract the exponents 3 integer multiply/divide the mantissae Flush to zero 0 β e min β e + 1 min β e + 2 min β e + 3 min Gradual underflow 0 β e min β e + 1 min β e + 2 min β e + 3 min Going < smallest normalisable number in the rep. causes headaches subnormal/denormalised numbers are inaccurate and slow Rounding/representation error multiplies as floats are operated on always a problem, but magnified with denormalised numbers

Floating-point Addition and Subtraction A + B (when A > B) 1 shift B to the same exponent as A 2 integer add/subtract the mantissae B = 2, E = 129 again Two numbers are very different in size = the smaller is ignored There exists a smallest number that can be added/subtracted meaningfully from 1.0 machine accuracy Two numbers very similar size = subtraction causes cancellation errors Significant digits are lost, only those affected by round-off remain

on integers Integer operations and arithmetic are fast and exact Well, except for division in this case the remainder is truncated there are other tricks to get quick rounded results Standard logical operations can be performed bitwise on whole integers at once This is good for: quick multiplication or division by 2 extracting/setting combinations of flags (booleans)

Outline Variable types 1 Variable types 2 3

Typing Variable types Languages can be strongly typed C, Fortran, many others = compatibility of variable types for each expression is checked at compile-time i.e. i n t e g e r : : mynum=3 character ( len =4) : : bird= " duck " mynum = 5!OK mynum = " 5 "! FAIL bird = mynum! FAIL or weakly typed e.g. Perl = type conversion happens automatically sometimes convenient, usually just dangerous Even strongly-typed languages can have implicit typing The golden rule: always declare variable types explicitly in your numerical codes!!!

Default widths Variable types Most new systems are 64-bit Most scientific code was written on 32-bit machines Variable widths can differ! Classic nasty bug: passing pointers between C and Fortran C pointer has same width as long int (32 bits) on 32-bit, but twice the width on 64-bit Fortran has no separate pointer type (attribute only) Linked C-F90 code compiles fine on both machines Crashes with seg fault on 64-bit machine, runs OK on 32-bit = not enough space reserved for 64-bit pointer in C-F90 interface

Comparison using floating-point variables Never, ever write or i f ( myvar. eq. 0.d0 ) i f ( myvar1. eq. myvar2 ) Floating point numbers cannot be trusted to be exactly equal to anything!!! Much safer to write i f ( abs ( myvar ). lt. absprec ) or i f ( abs ( myvar1 / myvar2 1.d0 ). lt. relprec ) where absprec = Aɛ and relprec = Bɛ Floating point comparisons only OK to within some suitable multiple A, B >> 1 of ɛ!!! In practice A and B are set per-algorithm (according to round-off error, speed, truncation error, etc)

Basic optimisation Write things in terms of + and, minimise / if possible i.e. (x+x+x) / (y*z) is faster than 3*x / y / z Multiplication is faster than exponentiation i.e. (x*x) is faster than x^2 at some stage the extra operations add up though; (x*x*x*x*x) is probably slower than x^5 Define any simple function inline if possible There are lots of other tricks modern compilers will do most of these things for you automatically, but you can usually help them out by thinking about minimising number and complexity of operations

Renormalisation of variables Avoid working with very big floats Speed - most CPUs are quicker doing double prec than extended prec Space - doubles take less space than extendeds (although nobody seems to care nowadays) Accuracy - it is easy to go beyond the max/min representable number Also avoid working with very small floats Speed - denormalised arithmetic is slooooow Space - as with big floats Accuracy - denormalised arithmetic is inaccurate Instead, choose a scale and renormalise A B A/A max Easier + faster to multiply by a scalefactor afterwards, even if that requires type conversion

Renormalisation of variables Consider working in log space if your data spans a huge range of scales, and you care about the small ones e.g. a range of 2 70 is way easier to work with than 100 10 70 often a lot faster and more accurate; no huge or tiny floats Easier + faster to exponentiate afterwards, even if that requires type conversion

Next lecture: Monday Jan 21 Interpolation, especially cubic splines Assignment 1 is available now on the homepage Covers: Floating point issues & cubic spline interpolation Due in: Wednesday Jan 30 (email is fine)