IEEE 754 Floating-Point Format

Similar documents
IEEE 754 Floating-Point Format

Programming & Data Structure: CS Section - 1/A DO NOT POWER ON THE MACHINE

COMP2611: Computer Organization. Data Representation

Numeric Encodings Prof. James L. Frankel Harvard University

Basic Assignment and Arithmetic Operators

Floating-point Arithmetic. where you sum up the integer to the left of the decimal point and the fraction to the right.

Floating Point (with contributions from Dr. Bin Ren, William & Mary Computer Science)

CSCI 402: Computer Architectures. Arithmetic for Computers (3) Fengguang Song Department of Computer & Information Science IUPUI.

IEEE-754 floating-point

Floating Point Numbers

Floating Point Numbers

Floating Point Numbers

INTEGER REPRESENTATIONS

15213 Recitation 2: Floating Point

ECE232: Hardware Organization and Design

Data Representation Floating Point

EE 109 Unit 19. IEEE 754 Floating Point Representation Floating Point Arithmetic

COSC 243. Data Representation 3. Lecture 3 - Data Representation 3 1. COSC 243 (Computer Architecture)

Floating Point Arithmetic

Data Representation Floating Point

C NUMERIC FORMATS. Overview. IEEE Single-Precision Floating-point Data Format. Figure C-0. Table C-0. Listing C-0.

Computer Systems C S Cynthia Lee

Floating Point Arithmetic

World Inside a Computer is Binary

Floating Point Numbers. Lecture 9 CAP

Representing and Manipulating Floating Points. Jo, Heeseung

Representing and Manipulating Floating Points

Representing and Manipulating Floating Points. Computer Systems Laboratory Sungkyunkwan University

3.5 Floating Point: Overview

What Every Programmer Should Know About Floating-Point Arithmetic

Chapter 2 Float Point Arithmetic. Real Numbers in Decimal Notation. Real Numbers in Decimal Notation

Floating Point. CSE 351 Autumn Instructor: Justin Hsia

Representing and Manipulating Floating Points

Floating-Point Data Representation and Manipulation 198:231 Introduction to Computer Organization Lecture 3

Description Hex M E V smallest value > largest denormalized negative infinity number with hex representation 3BB0 ---

Floating point. Today! IEEE Floating Point Standard! Rounding! Floating Point Operations! Mathematical properties. Next time. !

Floating Point Puzzles. Lecture 3B Floating Point. IEEE Floating Point. Fractional Binary Numbers. Topics. IEEE Standard 754

IEEE Standard for Floating-Point Arithmetic: 754

ecture 25 Floating Point Friedland and Weaver Computer Science 61C Spring 2017 March 17th, 2017

Number of decimal places of precision has reduced! We supplied a value correct to 12 decimal places

CS356: Discussion #3 Floating-Point Operations. Marco Paolieri

Binary representation of integer numbers Operations on bits

Representing and Manipulating Floating Points

Numerical computing. How computers store real numbers and the problems that result

IT 1204 Section 2.0. Data Representation and Arithmetic. 2009, University of Colombo School of Computing 1

Computer Architecture Chapter 3. Fall 2005 Department of Computer Science Kent State University

Number Systems Standard positional representation of numbers: An unsigned number with whole and fraction portions is represented as:

Foundations of Computer Systems

CS 33. Data Representation (Part 3) CS33 Intro to Computer Systems VIII 1 Copyright 2018 Thomas W. Doeppner. All rights reserved.

The Sign consists of a single bit. If this bit is '1', then the number is negative. If this bit is '0', then the number is positive.

Floating Point. CSE 351 Autumn Instructor: Justin Hsia

COMP2121: Microprocessors and Interfacing. Number Systems

Time: 8:30-10:00 pm (Arrive at 8:15 pm) Location What to bring:

CS 61C: Great Ideas in Computer Architecture Performance and Floating Point Arithmetic

Floating Point : Introduction to Computer Systems 4 th Lecture, May 25, Instructor: Brian Railing. Carnegie Mellon

Outline. What is Performance? Restating Performance Equation Time = Seconds. CPU Performance Factors

CS 61C: Great Ideas in Computer Architecture Performance and Floating-Point Arithmetic

Floating Point. CSE 351 Autumn Instructor: Justin Hsia

CS 61C: Great Ideas in Computer Architecture Floating Point Arithmetic

Floating Point Puzzles The course that gives CMU its Zip! Floating Point Jan 22, IEEE Floating Point. Fractional Binary Numbers.

Physics 306 Computing Lab 5: A Little Bit of This, A Little Bit of That

Systems I. Floating Point. Topics IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties

EE 109 Unit 20. IEEE 754 Floating Point Representation Floating Point Arithmetic

Floating Point January 24, 2008

Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition. Carnegie Mellon

Real Numbers. range -- need to deal with larger range, not just [not just -2^(n-1) to +2^(n-1)-1] - large numbers - small numbers (fractions)

Data Representation Floating Point

CS61c Midterm Review (fa06) Number representation and Floating points From your friendly reader

IEEE Standard 754 Floating Point Numbers

Floating Point Puzzles. Lecture 3B Floating Point. IEEE Floating Point. Fractional Binary Numbers. Topics. IEEE Standard 754

Floating Point. CSE 238/2038/2138: Systems Programming. Instructor: Fatma CORUT ERGİN. Slides adapted from Bryant & O Hallaron s slides

Computer Organization: A Programmer's Perspective

Introduction to Computer Systems Recitation 2 May 29, Marjorie Carlson Aditya Gupta Shailin Desai

Roadmap. The Interface CSE351 Winter Frac3onal Binary Numbers. Today s Topics. Floa3ng- Point Numbers. What is ?

The ALU consists of combinational logic. Processes all data in the CPU. ALL von Neuman machines have an ALU loop.

Floating Point Numbers

Chapter Three. Arithmetic

System Programming CISC 360. Floating Point September 16, 2008

IEEE Floating Point Numbers Overview

18-642: Floating Point Blues

Today: Floating Point. Floating Point. Fractional Binary Numbers. Fractional binary numbers. bi bi 1 b2 b1 b0 b 1 b 2 b 3 b j

CS 61C: Great Ideas in Computer Architecture Performance and Floating-Point Arithmetic

CS 261 Fall Floating-Point Numbers. Mike Lam, Professor.

Midterm Exam Answers Instructor: Randy Shepherd CSCI-UA.0201 Spring 2017

Classes of Real Numbers 1/2. The Real Line

Data Representation in Computer Memory

CS 261 Fall Floating-Point Numbers. Mike Lam, Professor.

Floating point. Today. IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties Next time.

CS429: Computer Organization and Architecture

Giving credit where credit is due

Internal representation. Bitwise operators

Today CISC124. Building Modular Code. Designing Methods. Designing Methods, Cont. Designing Methods, Cont. Assignment 1 due this Friday, 7pm.

Giving credit where credit is due

Lecture 13: (Integer Multiplication and Division) FLOATING POINT NUMBERS

CS101 Introduction to computing Floating Point Numbers

Slide Set 11. for ENCM 369 Winter 2015 Lecture Section 01. Steve Norman, PhD, PEng

CMPSCI 145 MIDTERM #2 SPRING 2017 April 7, 2017 Professor William T. Verts NAME PROBLEM SCORE POINTS GRAND TOTAL 100

CS101 Lecture 04: Binary Arithmetic

Day05 A. Young W. Lim Sat. Young W. Lim Day05 A Sat 1 / 14


Transcription:

C Programming 1 IEEE 754 Floating-Point Format

C Programming 2 Floating-Point Decimal Number 123456. 10 1 = 12345.6 10 0 = 1234.56 10 1 = 123.456 10 2 = 12.3456 10 3 = 1.23456 10 4 (normalised) 0.12345 10 5 0.01234 10 6

C Programming 3 Note There are different representations for the same number and there is no fixed position for the decimal point. Given a fixed number of digits, there may be a loss of precision. Three pieces of information represents a number: sign of the number, the significant value and the signed exponent of 10.

C Programming 4 Note Given a fixed number of digits, the floating-point representation covers a wider range of values compared to a fixed-point representation. Naturally most of the numbers in the range do not have accurate representation.

C Programming 5 Example The range of a fixed-point decimal system with six digits, of which two are after the decimal point, is 0.00 to 9999.99. This is same as non-negative integers. The range of a floating-point representation of the form m.mmm 10 ee is 0.0,0.001 10 0 to 9.999 10 99. Note that the radix-10 is implicit.

C Programming 6 In a C Program Data of type float and double are represented as binary floating-point numbers. These are approximations of real numbers a like an int, an approximation of integers. a In general a real number may have infinite information content. It cannot be stored in the computer memory and cannot be processed by the CPU.

C Programming 7 IEEE 754 Standard Most of the binary floating-point representations follow the IEEE-754 standard. The data type float uses IEEE 32-bit single precision format and the data type double uses IEEE 64-bit double precision format. A floating-point constant is treated as a double precision number by GCC.

C Programming 8 Bit Patterns There are 4294967296 patterns for any 32-bit format and 18446744073709551616 patterns for the 64-bit format. The number of representable float data is same as int data. But a wider range can be covered by a floating-point format due to non-uniform distribution of values over the range.

C Programming 9 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 s exponent significand/mantissa 1 bit 8 bits 23 bits Single Precession (32 bit) 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 s exponent significand/mantissa 1 bit 11 bits 20 bits significand (continued) 32 bits Double Precession (64 bit)

C Programming 10 Bit Pattern #include <stdio.h> void printfloatbits(float); int main() // floatbits.c { float x; printf("enter a floating-point numbers: ") scanf("%f", &x); printf("bits of %f are:\n", x); printfloatbits(x);

C Programming 11 putchar( \n ); return 0; } void printbits(unsigned int a){ static int flag = 0; if(flag!= 32) { ++flag; printbits(a/2); printf("%d ", a%2); --flag;

C Programming 12 if(flag == 31 flag == 23) putchar( } } void printfloatbits(float x){ unsigned int *ip = (unsigned int *)&x; printbits(*ip); }

C Programming 13 Float Bit Pattern float Data Bit Pattern 1.0 0 0 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.0 1 0 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.7 0 0 1 1 1 1 1 1 1 1 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 1 0 2.0 10 38 0 0 0 0 0 0 0 0 1 1 0 1 1 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 1 0 1 2.0 10 39 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 1 1 0 0 0 1 1 1 0 0 1 1 0 0 0 0

C Programming 14 Interpretation of Bits The most significant bit indicates the sign of the number - one is negative and zero is positive. The next eight bits (11 in case of double precision) store the value of the signed exponent of two (2 biasedexp ). Remaining 23 bits (52 in case of double precision) are for the significand (mantissa).

C Programming 15 Types of Data Data represented in this format are classified in five groups. Normalized numbers, Zeros, Subnormal(denormal) numbers, Infinity and not-a-number (nan).

C Programming 16 Single Precision Data: Interpretation Single Precision Data Type Exponent Significand 0 0 ±0 0 nonzero ± subnormal number 1-254 anything ± normalized number 255 0 ± 255 nonzero N an (not a number)

C Programming 17 Double Precision Data Double Precision Data Type Exponent Significand 0 0 ±0 0 nonzero ± subnormal number 1-2046 anything ± normalized number 2047 0 ± 2047 nonzero N an (not a number)

C Programming 18 Different Types of float

C Programming 19 Not a number: signaling nan 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 Not a number: quiet nan 0 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 Infinity: inf 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Largest Normal: 3.402823e+38 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Smallest Normal: 1.175494e-38 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

C Programming 20 Different Types of float Smallest Normal: 1.175494e-38 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Largest De-normal: 1.175494e-38 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Smallest De-normal: 1.401298e-45 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 Zero: 0.000000e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

C Programming 21 Single Precision Normalized Number Let the sign bit (31) be s, the exponent (30-23) be e and the mantissa (significand or fraction) (22-0) be m. The valid range of the exponents is 1 to 254 (if e is treated as an unsigned number). The actual exponent is biased by 127 to get e i.e. the actual value of the exponent is e 127. This gives the range: 2 1 127 = 2 126 to 2 254 127 = 2 127.

C Programming 22 Single Precision Normalized Number The normalized significand is 1.m (binary dot). The binary point is before bit-22 and the 1 (one) is not present explicitly. The sign bit s = 1 for a ve number it is 0 for a +ve number. The value of a normalized number is ( 1) s 1.m 2 e 127

C Programming 23 An Example Consider the following 32-bit pattern 1 1011 0110 011 0000 0000 0000 0000 0000 The value is ( 1) 1 2 10110110 01111111 1.011 = 1.375 2 55 = 49539595901075456.0 = 4.9539595901075456 10 16

C Programming 24 An Example Consider the decimal number: +105.625. The equivalent binary representation is +1101001.101 = +1.101001101 2 6 = +1.101001101 2 133 127 = +1.101001101 2 10000101 01111111 In IEEE 754 format: 0 1000 0101 101 0011 0100 0000 0000 0000

C Programming 25 An Example Consider the decimal number: +2.7. The equivalent binary representation is +10.10 1100 1100 1100 = +1.010 1100 1100 2 1 = +1.010 1100 1100 2 128 127 = +1.010 1100 2 10000000 01111111 In IEEE 754 format (approximate): 0 1000 0000 010 1100 1100 1100 1100 1101

C Programming 26 Range of Significand The range of significand for a 32-bit number is 1.0 to (2.0 2 23 ).

C Programming 27 Count of Numbers The count of floating point numbers x, m 2 i x < m 2 i+1 is 2 23, where 126 i 127 and 1.0 m 2.0 2 23.

C Programming 28 Count of Numbers The count of floating point numbers within the ranges [2 126,2 125 ),, [ 1 4, 1 2 ), [1 2,1.0), [1.0,2.0), [2.0,4.0),, [1024.0,2048.0),, [2 126,2 127 ) etc are all equal. In fact there are also 2 23 numbers in the range [2 127, )

C Programming 29 Single Precision Subnormal Number The interpretation of a subnormal a number is different. The content of the exponent part (e) is zero and the significand part (m) is non-zero. The value of a subnormal number is ( 1) s 0.m 2 126 There is no implicit one in the significand. a This was also know as denormal numbers.

C Programming 30 Note The smallest magnitude of a normalized number in single precision is ± 0000 0001 000 0000 0000 0000 0000 0000, whose value is 1.0 2 126. The largest magnitude of a normalized number in single precision is ± 1111 1110 111 1111 1111 1111 1111 1111, whose value is 1.99999988 2 127 3.403 10 38.

C Programming 31 Note The smallest magnitude of a subnormal number in single precision is ± 0000 0000 000 0000 0000 0000 0000 0001, whose value is 2 126+( 23) = 2 149. The largest magnitude of a subnormal number in single precision is ± 0000 0000 111 1111 1111 1111 1111 1111, whose value is 0.99999988 2 126.

C Programming 32 Note The smallest subnormal 2 149 is closer to zero. The largest subnormal 0.99999988 2 126 is closer to the smallest normalized number 1.0 2 126.

C Programming 33 Note Due to the presence of the subnormal numbers, there are 2 23 numbers within the range [0.0,1.0 2 126 ).

C Programming 34 Note Infinity: : 1111 1111 000 0000 0000 0000 0000 0000 is greater than (as an unsigned integer) the largest normal number: 1111 1110 111 1111 1111 1111 1111 1111

C Programming 35 Note The smallest difference between two normalized numbers is 2 149. This is same as the difference between any two consecutive subnormal numbers. The largest difference between two consecutive normalized numbers is 2 104. Non-uniform distribution

C Programming 36 ± Zeros There are two zeros (±) in the IEEE representation, but testing their equality gives true.

C Programming 37 #include <stdio.h> int main() // twozeros.c { double a = 0.0, b = -0.0 ; } printf("a: %f, b: %f\n", a, b) ; if(a == b) printf("equal\n"); else printf("unequal\n"); return 0;

C Programming 38 $ cc -Wall twozeros.c $ a.out a: 0.000000, b: -0.000000 Equal

C Programming 39 Largest +1 = The 32-bit pattern for infinity is 0 1111 1111 000 0000 0000 0000 0000 0000 The largest 32-bit normalized number is 0 1111 1110 111 1111 1111 1111 1111 1111 If we treat the largest normalized number as an int data and add one to it, we get.

C Programming 40 Largest +1 = #include <stdio.h> int main() // infinity.c { float f = 1.0/0.0 ; int *ip ; printf("f: %f\n", f); ip = (int *)&f; --(*ip); printf("f: %f\n", f); } return 0 ;

C Programming 41 Largest +1 = $ cc -Wall infinity.c $./a.out f: inf f: 340282346638528859811704183484516925440.00

C Programming 42 Note Infinity can be used in a computation e.g. we can compute tan 1.

C Programming 43 Note #include <stdio.h> #include <math.h> int main() // infinity1.c { float f ; f = 1.0/0.0 ; printf("atan(%f) = %f\n",f,atan(f)); printf("1.0/%f = %f\n", f, 1.0/f) ; return 0; }

C Programming 44 $ cc -Wall infinity1.c $./a.out atan(inf) = 1.570796 1.0/inf = 0.000000 tan 1 = π/2 and 1/ = 0

C Programming 45 Note The value infinity can be used in comparison. + is larger than any normalized or denormal number. On the other hand nan cannot be used for comparison.

C Programming 46 NaN There are two types of NaNs - quiet NaN and signaling NaN. A few cases where we get NaN: 0.0/0.0, ± /±, 0 ±, +, sqrt( 1.0), log( 1.0) https://en.wikipedia.org/wiki/nan

C Programming 47 NaN #include <stdio.h> #include <math.h> int main() // nan.c { printf("0.0/0.0: %f\n", 0.0/0.0); printf("inf/inf: %f\n", (1.0/0.0)/(1.0/0.0)); printf("0.0*inf: %f\n", 0.0*(1.0/0.0)); printf("-inf + inf: %f\n", (-1.0/0.0) + (1.0/0.0)); printf("sqrt(-1.0): %f\n", sqrt(-1.0)); printf("log(-1.0): %f\n", log(-1.0)); return 0; }

C Programming 48 NaN $ cc -Wall nan.c -lm $ a.out 0.0/0.0: -nan inf/inf: -nan 0.0*inf: -nan -inf + inf: -nan sqrt(-1.0): -nan log(-1.0): nan $

C Programming 49 A Few Programs

C Programming 50 int isinfinity(float) int isinfinity(float x){ // differentfloattype.c int *xp, ess; xp = (int *) &x; ess = *xp; ess = ((ess & 0x7F800000) >> 23); // exponent if(ess!= 255) return 0; ess = *xp; ess &= 0x007FFFFF; // significand if(ess!= 0) return 0; ess = *xp >> 31; // sign if(ess) return -1; return 1; }

C Programming 51 int isnan(float) It is a similar function where if(ess!= 0) return 0; is replaced by if(ess == 0) return 0;.