Floating Point Representation in Computers

Similar documents
CS 6210 Fall 2016 Bei Wang. Lecture 4 Floating Point Systems Continued

Floating Point. The World is Not Just Integers. Programming languages support numbers with fraction

FLOATING POINT NUMBERS

Floating Point Numbers

Divide: Paper & Pencil

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

15213 Recitation 2: Floating Point

Roundoff Errors and Computer Arithmetic

Chapter 2 Float Point Arithmetic. Real Numbers in Decimal Notation. Real Numbers in Decimal Notation

Floating Point January 24, 2008

fractional quantities are typically represented in computers using floating point format this approach is very much similar to scientific notation

Floating Point Arithmetic

Floating Point Puzzles. Lecture 3B Floating Point. IEEE Floating Point. Fractional Binary Numbers. Topics. IEEE Standard 754

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

The Sign consists of a single bit. If this bit is '1', then the number is negative. If this bit is '0', then the number is positive.

Floating Point (with contributions from Dr. Bin Ren, William & Mary Computer Science)

Signed Multiplication Multiply the positives Negate result if signs of operand are different

COMP2611: Computer Organization. Data Representation

Floating point. Today! IEEE Floating Point Standard! Rounding! Floating Point Operations! Mathematical properties. Next time. !

Floating-Point Numbers in Digital Computers

Data Representation Floating Point

Floating-Point Numbers in Digital Computers

Data Representation Floating Point

CO212 Lecture 10: Arithmetic & Logical Unit

Systems I. Floating Point. Topics IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties

VHDL IMPLEMENTATION OF IEEE 754 FLOATING POINT UNIT

Floating Point Numbers

Floating Point Numbers

Number Systems Standard positional representation of numbers: An unsigned number with whole and fraction portions is represented as:

C NUMERIC FORMATS. Overview. IEEE Single-Precision Floating-point Data Format. Figure C-0. Table C-0. Listing C-0.

Floating Point Puzzles. Lecture 3B Floating Point. IEEE Floating Point. Fractional Binary Numbers. Topics. IEEE Standard 754

System Programming CISC 360. Floating Point September 16, 2008

Floating-Point Data Representation and Manipulation 198:231 Introduction to Computer Organization Lecture 3

Floating point. Today. IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties Next time.

ECE232: Hardware Organization and Design

Floating-point Arithmetic. where you sum up the integer to the left of the decimal point and the fraction to the right.

Floating Point Puzzles The course that gives CMU its Zip! Floating Point Jan 22, IEEE Floating Point. Fractional Binary Numbers.

Floating-point representations

Chapter 3: Arithmetic for Computers

Floating-point representations

Floating-Point Arithmetic

Giving credit where credit is due

Giving credit where credit is due

Computer Architecture Chapter 3. Fall 2005 Department of Computer Science Kent State University

Classes of Real Numbers 1/2. The Real Line

Chapter 03: Computer Arithmetic. Lesson 09: Arithmetic using floating point numbers

Foundations of Computer Systems

Chapter 2 Data Representations

Floating Point Arithmetic

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

Computer Architecture and IC Design Lab. Chapter 3 Part 2 Arithmetic for Computers Floating Point

Finite arithmetic and error analysis

Data Representations & Arithmetic Operations

What Every Programmer Should Know About Floating-Point Arithmetic

Characters, Strings, and Floats

CS429: Computer Organization and Architecture

Number Systems and Computer Arithmetic

Numeric Encodings Prof. James L. Frankel Harvard University

Inf2C - Computer Systems Lecture 2 Data Representation

2 Computation with Floating-Point Numbers

Most nonzero floating-point numbers are normalized. This means they can be expressed as. x = ±(1 + f) 2 e. 0 f < 1

CS 261 Fall Floating-Point Numbers. Mike Lam, Professor.

Number Systems. Both numbers are positive

Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition. Carnegie Mellon

2.1.1 Fixed-Point (or Integer) Arithmetic

3.5 Floating Point: Overview

Floating Point. CSE 238/2038/2138: Systems Programming. Instructor: Fatma CORUT ERGİN. Slides adapted from Bryant & O Hallaron s slides

Chapter 2. Data Representation in Computer Systems

CS 261 Fall Floating-Point Numbers. Mike Lam, Professor.

Module 2: Computer Arithmetic

1.2 Round-off Errors and Computer Arithmetic

Floating Point Representation. CS Summer 2008 Jonathan Kaldor

CS Computer Architecture. 1. Explain Carry Look Ahead adders in detail

Floating Point : Introduction to Computer Systems 4 th Lecture, May 25, Instructor: Brian Railing. Carnegie Mellon

Chapter 3. Errors and numerical stability

Chapter Three. Arithmetic

Floating Point Arithmetic

CSCI 402: Computer Architectures. Arithmetic for Computers (3) Fengguang Song Department of Computer & Information Science IUPUI.

Review Questions 26 CHAPTER 1. SCIENTIFIC COMPUTING

Data Representation Floating Point

Scientific Computing. Error Analysis

Math 340 Fall 2014, Victor Matveev. Binary system, round-off errors, loss of significance, and double precision accuracy.

Floating Point Numbers

Matthieu Lefebvre Princeton University. Monday, 10 July First Computational and Data Science school for HEP (CoDaS-HEP)

IEEE-754 floating-point

Computer Organization: A Programmer's Perspective

Representing and Manipulating Floating Points. Jo, Heeseung

Today: Floating Point. Floating Point. Fractional Binary Numbers. Fractional binary numbers. bi bi 1 b2 b1 b0 b 1 b 2 b 3 b j

Computer Arithmetic Floating Point

Representing and Manipulating Floating Points

Number Systems. Decimal numbers. Binary numbers. Chapter 1 <1> 8's column. 1000's column. 2's column. 4's column

Binary floating point encodings

4.1 QUANTIZATION NOISE

EE878 Special Topics in VLSI. Computer Arithmetic for Digital Signal Processing

IEEE Standard 754 Floating Point Numbers

Floating Point Arithmetic. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

COSC 243. Data Representation 3. Lecture 3 - Data Representation 3 1. COSC 243 (Computer Architecture)

CO Computer Architecture and Programming Languages CAPL. Lecture 15

A Decimal Floating Point Arithmetic Unit for Embedded System Applications using VLSI Techniques

Computational Methods CMSC/AMSC/MAPL 460. Representing numbers in floating point Error Analysis. Ramani Duraiswami, Dept. of Computer Science

Transcription:

Floating Point Representation in Computers Floating Point Numbers - What are they? Floating Point Representation Floating Point Operations Where Things can go wrong

What are Floating Point Numbers? Any Number that cannot be completely described using an Integer number. 1, 2, 523. not a floating point number 1.02, e, 0.1. a floating point number Floating Point numbers are at their best when describing a continuous variable. Floating Point numbers are at their worst when describing discrete values.

Floating Point Number Representation. Significand Base Exponent C B Q Floating Point numbers constist of a signed significand (C) and a signed integer exponent(q) of the base(b). Most commonly represented according to the IEEE 754 Standard.

Floating Point Number Representation. Decimal notation is the sum of fractional powers of 10. 0.625= 6 10 2 100 5 1000 Computers store values as the sum of fractional powers of 2. 0.625= 1 2 0 4 1 8

Floating Point Number Representation. Some decimal numbers have no exact representation in base 2 0.1 1 2 1 1 2 0 2 2 0 3 2 4 1 2 5 1 2 0 6 2 0 7 2 1 8 2 9 1 2 0 10 2 0 11 2 1 12 2 1 13 2 0 14 2 15 0 2 1 16 2 1 17 2 0 18 2 19 0 2 1 20 2 1 21 2 0 22 2 0 23 2 24 The significand inifinitely repeats 1100 pattern. 1 2 1 1 2 2 0 2 3 0 2 4 1 2 5 1 2 6 0 2 7 0 2 8 1 2 9 1 2 10 0 2 11 0 2 12 1 2 13 1 2 14 0 2 15 0 2 16 1 2 17 1 2 18 0 2 19 0 2 20 1 2 21 1 2 22 0 2 23 1 2 24 0.100000001490116119384765625 Rounded results in a value not quite 0.1

IEEE 754 Types Name Common name Base Digits (p) Emin Emax Notes Decimal Decimal digits E max binary16 Half precision 2 10+1 14 15 storage, not basic 3.31 4.51 binary32 Single precision 2 23+1 126 127 7.22 38.23 binary64 Double precision 2 52+1 1022 1023 15.95 307.95 binary128 Quadruple precision 2 112+1 16382 16383 34.02 4931.77 A IEEE 754 float can represent a range of finite numbers, +, -, and NaN. The range of finite numbers is determined by the properties of the representation. The range of nonzero magnitudes representable is from 1xB^(Emin - p + 1) to (B^p - 1)^(Emax - p + 1).

IEEE 754 types Representable non-zero numbers are split in to two categories, Normal and Subnormal. The smallest magnitude Normal number is B^Emin. Subnormal numbers are number who's magnitude is less then B^Emin. Subnormal numbers allow for underflow exceptions to fail gracefully, getting smaller with loss of precision instead of snapping to zero.

Floating Point Rounding Rounding occurs when the exact result of an operation requires more precision then available in the significand. IEEE 754 Round off modes. round to nearest, where ties round to the nearest even digit in the required position. round to nearest, where ties round away from zero round towards + round towards - round towards zero

Floating Point Operations To Add or Subtract shift numbers left or right so that both have the same exponent, added, and the result is rounded and normalized. e=5; s=1.234567 (123456.7) + e=2; s=1.017654 (101.7654) e=5; s=1.234567 + e=5; s=0.001017654 (after shifting) -------------------- e=5; s=1.235584654 (true sum: 123558.4654) e=5; s=1.235585 (after rounding)

Floating Point Operations In extreme cases, the sum of two non-zero numbers may be equal to one of them. e=5; s=1.234567 + e= 3; s=9.876543 ###################################### e=5; s=1.234567 + e=5; s=0.00000009876543 (after shifting) ---------------------- e=5; s=1.23456709876543 (true sum) e=5; s=1.234567 (after rounding/normalization)

Floating Point Operations Catastrophic cancellation can result from the loss of precision when two close numbers are subtracted. e=5; s=1.234571 e=5; s=1.234567 ---------------- e=5; s=0.000004 e= 1; s=4.000000 (after rounding/normalization)

Floating Point Operations To multiply, the significands are multiplied while the exponents are added, and the result is rounded and normalized. e=3; s=4.734612 e=5; s=5.417242 ----------------------- e=8; s=25.648538980104 (true product) e=8; s=25.64854 (after rounding) e=9; s=2.564854 (after normalization)

Floating Point Problems Floating point addition and multiplication are not necessarily associative. That is (a + b) + c is not necessarily equal to a + (b + c). 1234.567 (a) + 45.67834 (b) 1280.24534 rounds to 1280.245 1280.245 (a + b) + 0.0004 (c) 1280.2454 rounds to 1280.245 a + (b + c): 45.67834 (b) + 0.0004 (c) 45.67874 45.67874 (b + c) + 1234.567 (a) 1280.24574 rounds to 1280.246

Floating Point Problems Floating point addition and multiplication are not necessarily distributive. That is, (a + b) c may not be the same as a c + b c. 1234.567 3.333333 = 4115.223 1.234567 3.333333 = 4.115223 4115.223 + 4.115223 = 4119.338 but 1234.567 + 1.234567 = 1235.802 1235.802 3.333333 = 4119.340

Floating Point Problems Cancellation: subtraction of nearly equal operands may cause extreme loss of accuracy. Conversions to integer are not intuitive: converting (63.0/9.0) to integer yields 7, but converting (0.63/0.09) may yield 6. This is because conversions generally truncate rather than round. Floor and ceiling functions may produce results which are off by one from the intuitively expected value.

Floating Point Problems Limited exponent range: results might overflow yielding infinity, or underflow yielding a subnormal number or zero.