Floating Point Numbers

Similar documents
Floating Point Numbers

Floating Point. CSE 238/2038/2138: Systems Programming. Instructor: Fatma CORUT ERGİN. Slides adapted from Bryant & O Hallaron s slides

Data Representation Floating Point

Floating Point : Introduction to Computer Systems 4 th Lecture, May 25, Instructor: Brian Railing. Carnegie Mellon

Data Representation Floating Point

Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition. Carnegie Mellon

Floating Point January 24, 2008

Systems I. Floating Point. Topics IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties

Chapter 2 Float Point Arithmetic. Real Numbers in Decimal Notation. Real Numbers in Decimal Notation

Giving credit where credit is due

Today: Floating Point. Floating Point. Fractional Binary Numbers. Fractional binary numbers. bi bi 1 b2 b1 b0 b 1 b 2 b 3 b j

Foundations of Computer Systems

Giving credit where credit is due

System Programming CISC 360. Floating Point September 16, 2008

Floating Point Numbers

Floating Point Puzzles. Lecture 3B Floating Point. IEEE Floating Point. Fractional Binary Numbers. Topics. IEEE Standard 754

Floating point. Today. IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties Next time.

Floating Point Puzzles. Lecture 3B Floating Point. IEEE Floating Point. Fractional Binary Numbers. Topics. IEEE Standard 754

Floating Point Puzzles The course that gives CMU its Zip! Floating Point Jan 22, IEEE Floating Point. Fractional Binary Numbers.

Floating Point (with contributions from Dr. Bin Ren, William & Mary Computer Science)

Floating point. Today! IEEE Floating Point Standard! Rounding! Floating Point Operations! Mathematical properties. Next time. !

Data Representation Floating Point

CS429: Computer Organization and Architecture

The course that gives CMU its Zip! Floating Point Arithmetic Feb 17, 2000

Floa.ng Point : Introduc;on to Computer Systems 4 th Lecture, Sep. 10, Instructors: Randal E. Bryant and David R.

Computer Organization: A Programmer's Perspective

CS 33. Data Representation (Part 3) CS33 Intro to Computer Systems VIII 1 Copyright 2018 Thomas W. Doeppner. All rights reserved.

Representing and Manipulating Floating Points

Representing and Manipulating Floating Points

Data Representa+on: Floa+ng Point Numbers

Floats are not Reals. BIL 220 Introduc6on to Systems Programming Spring Instructors: Aykut & Erkut Erdem

Representing and Manipulating Floating Points

Representing and Manipulating Floating Points. Jo, Heeseung

Representing and Manipulating Floating Points. Computer Systems Laboratory Sungkyunkwan University

COMP2611: Computer Organization. Data Representation

Floating Point Numbers

Floating-Point Data Representation and Manipulation 198:231 Introduction to Computer Organization Lecture 3

Computer Architecture Chapter 3. Fall 2005 Department of Computer Science Kent State University

Floating Point. CSE 351 Autumn Instructor: Justin Hsia

Floating Point. CENG331 - Computer Organization. Instructors: Murat Manguoglu (Section 1)

15213 Recitation 2: Floating Point

Up next. Midterm. Today s lecture. To follow

Floating Point. CSE 351 Autumn Instructor: Justin Hsia

Floating Point Arithmetic

CSCI 402: Computer Architectures. Arithmetic for Computers (3) Fengguang Song Department of Computer & Information Science IUPUI.

Systems Programming and Computer Architecture ( )

FLOATING POINT NUMBERS

Floating Point Arithmetic

ECE232: Hardware Organization and Design

Floating Point. The World is Not Just Integers. Programming languages support numbers with fraction

Floating-point Arithmetic. where you sum up the integer to the left of the decimal point and the fraction to the right.

Roadmap. The Interface CSE351 Winter Frac3onal Binary Numbers. Today s Topics. Floa3ng- Point Numbers. What is ?

Integers and Floating Point

Floating Point. CSE 351 Autumn Instructor: Justin Hsia

Bits, Bytes and Integers

EE 109 Unit 19. IEEE 754 Floating Point Representation Floating Point Arithmetic

CS 261 Fall Floating-Point Numbers. Mike Lam, Professor.

Finite arithmetic and error analysis

CS 261 Fall Floating-Point Numbers. Mike Lam, Professor.

CS 6210 Fall 2016 Bei Wang. Lecture 4 Floating Point Systems Continued

Floating-point representations

Floating-point representations

Chapter 03: Computer Arithmetic. Lesson 09: Arithmetic using floating point numbers

Introduction to Computer Systems Recitation 2 May 29, Marjorie Carlson Aditya Gupta Shailin Desai

Bits and Bytes: Data Presentation Mohamed Zahran (aka Z)

Ints and Floating Point

Numerical computing. How computers store real numbers and the problems that result

Written Homework 3. Floating-Point Example (1/2)

Floating Point Numbers. Lecture 9 CAP

EE 109 Unit 20. IEEE 754 Floating Point Representation Floating Point Arithmetic

Module 2: Computer Arithmetic

IEEE Standard for Floating-Point Arithmetic: 754

Number Systems Standard positional representation of numbers: An unsigned number with whole and fraction portions is represented as:

Divide: Paper & Pencil

Arithmetic and Bitwise Operations on Binary Data

Computer Architecture and IC Design Lab. Chapter 3 Part 2 Arithmetic for Computers Floating Point

3.5 Floating Point: Overview

Number Systems. Binary Numbers. Appendix. Decimal notation represents numbers as powers of 10, for example

Computer Arithmetic Floating Point

IT 1204 Section 2.0. Data Representation and Arithmetic. 2009, University of Colombo School of Computing 1

Floating Point Representation in Computers

Floating Point Arithmetic

CS101 Introduction to computing Floating Point Numbers

IEEE Floating Point Numbers Overview

The Sign consists of a single bit. If this bit is '1', then the number is negative. If this bit is '0', then the number is positive.

Numeric Encodings Prof. James L. Frankel Harvard University

CMPSCI 145 MIDTERM #1 Solution Key. SPRING 2017 March 3, 2017 Professor William T. Verts

MIPS Integer ALU Requirements

Operations On Data CHAPTER 4. (Solutions to Odd-Numbered Problems) Review Questions

CO212 Lecture 10: Arithmetic & Logical Unit

CS367 Test 1 Review Guide

Inf2C - Computer Systems Lecture 2 Data Representation

Real Numbers. range -- need to deal with larger range, not just [not just -2^(n-1) to +2^(n-1)-1] - large numbers - small numbers (fractions)

ecture 25 Floating Point Friedland and Weaver Computer Science 61C Spring 2017 March 17th, 2017

Floating Point. EE 109 Unit 20. Floating Point Representation. Fixed Point

Arithmetic and Bitwise Operations on Binary Data

M1 Computers and Data

Arithmetic for Computers. Hwansoo Han

MATH 353 Engineering mathematics III

C NUMERIC FORMATS. Overview. IEEE Single-Precision Floating-point Data Format. Figure C-0. Table C-0. Listing C-0.

Transcription:

Floating Point Numbers Computer Systems Organization (Spring 2016) CSCI-UA 201, Section 2 Instructor: Joanna Klukowska Slides adapted from Randal E. Bryant and David R. O Hallaron (CMU) Mohamed Zahran (NYU)

Fractions in Binary 2

Fractional binary numbers What is 1011.101 2? we use the same idea that we use when interpreting the decimal numbers, except here we multiply by powers of 2, not 10 the above number is 1 * 2 3 + 0 * 2 2 + 1 * 2 1 + 1 * 2 0 + 1 * 2-1 + 0 * 2-2 + 1 * 2-3 = 8 + 2 + 1 + ½ + ⅛ = 11.625 Simple enough? DNHI: Try to convert the following numbers to their binary representation 5 1 / 16, 2 ⅞, 15 ¾. Now, try 1 / 10 and and see how that goes. Convert the following binary numbers to their decimal representations: 0.1101, 101.010, 10.101. 3

Not good enough That way of representing floating point numbers is simple, but has many limitations. Only numbers that can be written as the sum of powers of 2 can be represented exactly, other numbers have repeating bit representation (this is the same problem as trying to represent ⅓ in decimal as 0.3333333...). ⅓ is 0.010101010101 ⅕ is 0.01100110011 1/10 is 0.001100110011 Just one possible location for the binary point. This limits how many bits can be used for the fractional part and the whole number part. We can either represent very large numbers well or very small numbers well, but not both. Up until 1985 floating point numbers were computer scientist nightmare because everybody was using different standards that dealt with the above problems. But do not forget that notation just yet - we will use it as part of the better notation. 4

1985: IEEE Standard 754 5

IEEE Floating Point Established 1985 Provides uniform standard for floating point arithmetic used by most (if not all) of current CPUs Standards for rounding, overflow, underflow Concerns for numerical accuracy were more important than fast hardware implementation not very good hardware performance 6

Floating Point Representation Numerical Form: (-1) s M * 2 E Sign bit s determines whether number is negative or positive Significand M (mantissa) normally a fractional value in range [1.0,2.0). Exponent E weighs value by powers of two Encoding the most significant bit s is the sign bit s exp field encodes E (but is not equal to E) frac field encodes M (but is not equal to M) 7

Using Different Number of Bytes Single precision: 32 bits Double precision: 64 bits Extended precision: 80 bits (Intel only) 8

Interpreting Values of IEEE 754 9

Normalized Values Condition: exp 000 0 and exp 111 1 bias Exponent is: E = exp - (2 k-1-1), k is the # of exponent bits Single precision: 2 k-1-1 = 127, exp = 1 254 E = -126 127 Double precision : 2 k-1-1 = 1023, exp = 1 2046 E = -1022 1023 (once we know the number of bits in exp, we can figure out the bias) Significand has implied leading 1: M = 1.xxx x 2 xxx x bits of frac Smallest value when all bits are zero: 000 0, M = 1.0 Largest value when all bits are one: 111 1, M = 2.0- ε By assuming the leading bit is 1, we get an extra bit for free 10

value = (-1) s M * 2 E Normalized Values - Example E = exp - (2 k-1-1) Value: floating point number F = 15213.0 represented using single precision 152131.0 = 11101101101101.0 2 = 1.1101101101101 2 * 2 13 (same binary sequence) Significand M = 1.1101101101101 2 frac = 11011011011010000000000 Exponent E = 13 Bias = 127 (for single precision) Exp = 140 10 = 10001100 2 Result: 11

DNHI Perform similar conversion for the following floating point numbers. Use the single precision IEEE representation: 1024 ¼ 17.75-17.75 12

Why is this encoding better? For single precision The value of exp is in the range 0 <= exp <= 255 the value of E is in the range -127 <= E <= 128 we can represent fairly large numbers when using 2 128 and some fairly small numbers when using 2-127 For double precision well, you get the point But we always have the leading one in the value of significand/mantissa, so we cannot represent numbers that are reeeeeaaaaly small. We need to talk about what happens when exp is all zeroes or all ones. 13

Denormalized Encoding Condition: exp = 000 0 (all zeroes) Exponent value: E = 1 - bias (instead of 0 - bias) Significand has implied leading 0 (not 1): M = 0.xxx x2 xxx x bits of frac Cases: exp = 000 0, frac = 000 0 represents zero value Note that we have two distinct values for zero: +0 and -0 (Why?) exp = 000 0, frac 000 0 represent numbers very close to zero (all denormalized encodings represent reeeeeaaaaly small numbers) 14

Special Values Encoding Condition: exp = 111 1 There are only two (well three cases here): Case 1, 2: exp = 111 1, frac = 000 0 Represents value (infinity) Operations that overflow Both positive and negative Eg: 1.0/0.0 = -1.0/-0.0 = +, -1.0/0.0 = 1.0/-0.0 = - Case 3: exp = 111 1, frac 000 0 Not-a-Number (NaN) Represents case when no numeric value can be determined Eg: sqrt(-1), -, *0 15

Number Line (not to scale) 16

Example 17

6-bit IEEE-like encoding most significant bit is the sign bit 3-bit encoding for exp bias 2 2-1 = 3 2-bit encoding for frac special values: 011100, 011101, 011110, 011111, 111100, 111101, 111110, 111111 denormalized values: 000000, 000001, 000010, 000011, 100000, 100001, 100010, 100011 smallest negative: 100011 M = 0.11, E = 1 - bias = -2, val = -0.11 * 2-2 = -0.1875 10 smallest positive (larger than zero): 000001 M = 0.01, E = 1 - bias = -2, val = 0.01 * 2-2 = 0.0625 10 normalized values: all others smallest positive: 000100 M = 1.00, E = 1 - bias = -2, val = 1.0 * 2-2 = 0.25 10 largest positive: 011011 M = 1.11, E = 6 - bias = 3, val = 1.11 * 2 3 = 14.00 10 All possible 6-bit sequences: 000000 000001 000010 000011 000100 000101 000110 000111 001000 001001 001010 001011 001100 001101 001110 001111 010000 010001 010010 010011 010100 010101 010110 010111 011000 011001 011010 011011 011100 011101 011110 011111 100000 100001 100010 100011 100100 100101 100110 100111 101000 101001 101010 101011 101100 101101 101110 101111 110000 110001 110010 110011 110100 110101 110110 110111 111000 111001 111010 111011 111100 111101 111110 111111 DNHI: Pick 10 different sequences from the table above and figure out their values in decimal. 18

Properties and Rules of IEEE 754 19

Special properties of IEEE encoding FP Zero Same as Integer Zero: all bits = 0 Can (Almost) Use Unsigned Integer Comparison Must first compare sign bits Must consider 0 = 0 NaNs problematic will be greater than any other values what should comparison yield? Otherwise proper ordering denorm vs. normalized normalized vs. infinity 20

Arithmetic Operations with Rounding x + f y = Round(x + y) x * f y = Round(x * y) Basic idea Compute exact result Make it fit into desired precision Possibly overflow if exponent too large Possibly round to fit into frac 21

Different Rounding Modes Rounding Modes (illustrate with $ rounding) $1.40 $1.60 $1.50 $2.50 $1.50 Towards zero $1 $1 $1 $2 $1 Round down (- ) $1 $1 $1 $2 $2 Round up (+ ) $2 $2 $2 $3 $1 Towards Nearest Even $1 $2 $2 $2 $2 (default) Same as regular rounding when we are not at the halfway point. Round towards nearest even number when the value is at the halfway point. 22

Round to Even - a closer look Default Rounding Mode All others are statistically biased Sum of a set of positive numbers will consistently be over- or under- estimated Applying to Other Decimal Places When exactly halfway between two possible values Round so that least significant digit is even E.g., round to nearest hundredth 7.8949999 7.89 (Less than halfway - round down) 7.8950001 7.90 (Greater than halfway - round up) 7.8950000 7.90 (Halfway - round up) 7.8850000 7.88 (Half way - round down) 23

Rounding Binary Numbers Binary Fractional Numbers Even when least significant bit is 0 Half way when bits to right of rounding position = 100 2 Examples Round to nearest 1/4 (2 bits right of binary point) Value Binary Rounded Action Rounded Value 2 3/32 10.00011 2 10.00 2 (< ⅛ - down) 2 2 3/16 10.00110 2 10.01 2 (> ⅛ - up) 2 ¼ 2 ⅞ 10.11100 2 11.00 2 ( ⅛ - up) 3 2 ⅝ 10.10100 2 10.10 2 ( ⅛ - down) 2 ½ 24

Multiplication ( 1) s1 M1 2 E1 * ( 1) s2 M2 2 E2 Exact Result: ( 1) s M 2 E Sign s: s1 ^ s2 (this is xor, not exponentiation) Significand M: M1 * M2 Exponent E: E1 + E2 Fixing If M 2, shift M right, increment E If E out of range, overflow Round M to fit frac precision Implementation Most expensive is multiplication of significands (but that is done just like for integers) 25

Addition ( 1) s1 M1 2 E1 + (-1) s2 M2 2 E2 assume E1 > E2 Exact Result: ( 1) s M 2 E Sign s, significand M: result of signed align & add Exponent E: E1 Fixing If M 2, shift M right, increment E if M < 1, shift M left k positions, decrement E by k Overflow if E out of range Round M to fit frac precision 26

Properties of Floating Point Addition Closed under addition? YES But may generate infinity or NaN Commutative? YES Associative? NO Overflow and inexactness of rounding: ( 3.14 + 1e 10 ) - 1e 10 = 0, 3.14 + ( 1e 10-1e 10 ) = 3.14 0 is additive identity? YES Every element has additive inverse? ALMOST Yes, except for infinities & NaNs Monotonicity ALMOST a b a+c b+c? Except for infinities & NaNs 27

Properties of Floating Point Multiplication Closed under multiplication? YES But may generate infinity or NaN Multiplication Commutative? YES Multiplication is Associative? NO Possibility of overflow, inexactness of rounding Ex: ( 1e 20 * 1e 20 ) * 1e -20 = inf, 1e 20 * ( 1e 20 *1e -20 ) = 1e 20 1 is multiplicative identity? YES Multiplication distributes over addition? NO Possibility of overflow, inexactness of rounding 1e 20 * ( 1e 20-1e 20 ) = 0.0, 1e 20 * 1e 20-1e 20 * 1e 20 = NaN Monotonicity ALMOST a b & c 0 a * c b *c? Except for infinities & NaNs 28

Floating Point in C 29

C Language C Guarantees Two Levels float single precision double double precision Conversions/Casting casting between int, float, and double changes bit representation double/float int Truncates fractional part Like rounding toward zero Not defined when out of range or NaN: Generally sets to TMin int double Exact conversion, as long as int has 53 bit word size int float Will round according to rounding mode 30

Puzzles (DNHI) For each of the following C expressions, either: Argue that it is true for all possible argument values Explain why if not true Assume: neither d nor f is NaN 31