EE 109 Unit 19. IEEE 754 Floating Point Representation Floating Point Arithmetic

Similar documents
EE 109 Unit 20. IEEE 754 Floating Point Representation Floating Point Arithmetic

Floating Point. EE 109 Unit 20. Floating Point Representation. Fixed Point

10.1. Unit 10. Signed Representation Systems Binary Arithmetic

Floating Point (with contributions from Dr. Bin Ren, William & Mary Computer Science)

Floating-Point Data Representation and Manipulation 198:231 Introduction to Computer Organization Lecture 3

Floating Point : Introduction to Computer Systems 4 th Lecture, May 25, Instructor: Brian Railing. Carnegie Mellon

Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition. Carnegie Mellon

Floating Point. CSE 238/2038/2138: Systems Programming. Instructor: Fatma CORUT ERGİN. Slides adapted from Bryant & O Hallaron s slides

ECE232: Hardware Organization and Design

Floating Point. The World is Not Just Integers. Programming languages support numbers with fraction

Foundations of Computer Systems

Divide: Paper & Pencil

Floating Point Numbers. Lecture 9 CAP

Floating point. Today! IEEE Floating Point Standard! Rounding! Floating Point Operations! Mathematical properties. Next time. !

Data Representation Floating Point

Data Representation Floating Point

COMP2611: Computer Organization. Data Representation

Numeric Encodings Prof. James L. Frankel Harvard University

Today: Floating Point. Floating Point. Fractional Binary Numbers. Fractional binary numbers. bi bi 1 b2 b1 b0 b 1 b 2 b 3 b j

Chapter 2 Float Point Arithmetic. Real Numbers in Decimal Notation. Real Numbers in Decimal Notation

Floating Point Numbers

Floating-point Arithmetic. where you sum up the integer to the left of the decimal point and the fraction to the right.

Representing and Manipulating Floating Points

Floating Point Puzzles. Lecture 3B Floating Point. IEEE Floating Point. Fractional Binary Numbers. Topics. IEEE Standard 754

Number Systems. Decimal numbers. Binary numbers. Chapter 1 <1> 8's column. 1000's column. 2's column. 4's column

15213 Recitation 2: Floating Point

Floating Point Arithmetic

Floating Point January 24, 2008

Floating Point Numbers

Floating Point Numbers

ecture 25 Floating Point Friedland and Weaver Computer Science 61C Spring 2017 March 17th, 2017

Representing and Manipulating Floating Points

CSCI 402: Computer Architectures. Arithmetic for Computers (3) Fengguang Song Department of Computer & Information Science IUPUI.

Systems I. Floating Point. Topics IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties

Floating Point Puzzles The course that gives CMU its Zip! Floating Point Jan 22, IEEE Floating Point. Fractional Binary Numbers.

Lecture 13: (Integer Multiplication and Division) FLOATING POINT NUMBERS

IEEE-754 floating-point

Scientific Computing. Error Analysis

Floating Point. CSE 351 Autumn Instructor: Justin Hsia

Floating Point Puzzles. Lecture 3B Floating Point. IEEE Floating Point. Fractional Binary Numbers. Topics. IEEE Standard 754

3.5 Floating Point: Overview

CO212 Lecture 10: Arithmetic & Logical Unit

Floating Point. CSE 351 Autumn Instructor: Justin Hsia

C NUMERIC FORMATS. Overview. IEEE Single-Precision Floating-point Data Format. Figure C-0. Table C-0. Listing C-0.

Giving credit where credit is due

Floating Point. CSE 351 Autumn Instructor: Justin Hsia

CS 61C: Great Ideas in Computer Architecture Floating Point Arithmetic

EE260: Logic Design, Spring n Integer multiplication. n Booth s algorithm. n Integer division. n Restoring, non-restoring

Giving credit where credit is due

System Programming CISC 360. Floating Point September 16, 2008

Computer Systems C S Cynthia Lee

CS 33. Data Representation (Part 3) CS33 Intro to Computer Systems VIII 1 Copyright 2018 Thomas W. Doeppner. All rights reserved.

Computer Architecture and IC Design Lab. Chapter 3 Part 2 Arithmetic for Computers Floating Point

IEEE Floating Point Numbers Overview

Floating Point Numbers

CS429: Computer Organization and Architecture

Floating point. Today. IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties Next time.

Chapter 03: Computer Arithmetic. Lesson 09: Arithmetic using floating point numbers

Number Systems and Computer Arithmetic

Number Systems Standard positional representation of numbers: An unsigned number with whole and fraction portions is represented as:

Integers and Floating Point

8/30/2016. In Binary, We Have A Binary Point. ECE 120: Introduction to Computing. Fixed-Point Representations Support Fractions

Computer Architecture Chapter 3. Fall 2005 Department of Computer Science Kent State University

MACHINE LEVEL REPRESENTATION OF DATA

Chapter Three. Arithmetic

Data Representation Floating Point

IT 1204 Section 2.0. Data Representation and Arithmetic. 2009, University of Colombo School of Computing 1

Inf2C - Computer Systems Lecture 2 Data Representation

95% of the folks out there are completely clueless about floating-point.

UCB CS61C : Machine Structures

Computer Organization: A Programmer's Perspective

CS 261 Fall Floating-Point Numbers. Mike Lam, Professor.

World Inside a Computer is Binary

M1 Computers and Data

Numerical computing. How computers store real numbers and the problems that result

Computer (Literacy) Skills. Number representations and memory. Lubomír Bulej KDSS MFF UK

CS 261 Fall Floating-Point Numbers. Mike Lam, Professor.

COSC 243. Data Representation 3. Lecture 3 - Data Representation 3 1. COSC 243 (Computer Architecture)

Real Numbers. range -- need to deal with larger range, not just [not just -2^(n-1) to +2^(n-1)-1] - large numbers - small numbers (fractions)

Floating-point representations

Floating-point representations

CS 61C: Great Ideas in Computer Architecture Performance and Floating Point Arithmetic

The ALU consists of combinational logic. Processes all data in the CPU. ALL von Neuman machines have an ALU loop.

CS 265. Computer Architecture. Wei Lu, Ph.D., P.Eng.

MIPS Integer ALU Requirements

Integer Multiplication. Back to Arithmetic. Integer Multiplication. Example (Fig 4.25)

Number Representations

Floating Point Arithmetic

CHAPTER V NUMBER SYSTEMS AND ARITHMETIC

Representing and Manipulating Floating Points. Jo, Heeseung

CS 6210 Fall 2016 Bei Wang. Lecture 4 Floating Point Systems Continued

Floating Point. CENG331 - Computer Organization. Instructors: Murat Manguoglu (Section 1)

Outline. What is Performance? Restating Performance Equation Time = Seconds. CPU Performance Factors

CS 61C: Great Ideas in Computer Architecture Performance and Floating-Point Arithmetic

Introduction to Computer Systems Recitation 2 May 29, Marjorie Carlson Aditya Gupta Shailin Desai

COMP Overview of Tutorial #2

Organisasi Sistem Komputer

Floating Point Representation. CS Summer 2008 Jonathan Kaldor

Real Numbers finite subset real numbers floating point numbers Scientific Notation fixed point numbers

Chapter 4: Data Representations

Transcription:

1 EE 109 Unit 19 IEEE 754 Floating Point Representation Floating Point Arithmetic

2 Floating Point Used to represent very small numbers (fractions) and very large numbers Avogadro s Number: +6.0247 * 10 23 Planck s Constant: +6.6254 * 10-27 Note: 32 or 64-bit integers can t represent this range Floating Point representation is used in HLL s like C by declaring variables as float or double

3 Fixed Point Unsigned and 2 s complement fall under a category of representations called Fixed Point The radix point is assumed to be in a fixed location for all numbers [Note: we could represent fractions by implicitly assuming the binary point is at the left A variable just stores bits you can assume the binary point is anywhere you like] Integers: 10011101. (binary point to right of LSB) For 32-bits, unsigned range is 0 to ~4 billion Fractions:.10011101 (binary point to left of MSB) Bit storage Fixed point Rep. Range [0 to 1) Main point: By fixing the radix point, we limit the range of numbers that can be represented Floating point allows the radix point to be in a different location for each value

4 Floating Point Representation Similar to scientific notation used with decimal numbers ±D.DDD * 10 ±exp Floating Point representation uses the following form ±b.bbbb * 2 ±exp 3 Fields: sign, exponent, fraction (also called mantissa or significand) Overall Sign of # S Exp. fraction

5 Normalized FP Numbers Decimal Example +0.754*10 15 is not correct scientific notation Must have exactly one significant digit before decimal point: +7.54*10 14 In binary the only significant digit is 1 Thus normalized FP format is: ±1.bbbbbb * 2 ±exp FP numbers will always be normalized before being stored in memory or a reg. The 1. is actually not stored but assumed since we always will store normalized numbers If HW calculates a result of 0.001101*2 5 it must normalize to 1.101000*2 2 before storing

6 IEEE Floating Point Formats Single Precision (32-bit format) 1 Sign bit (0=pos/1=neg) 8 Exponent bits Excess-127 representation More on next slides 23 fraction (significand or mantissa) bits Equiv. Decimal Range: 7 digits x 10 ±38 Double Precision (64-bit format) 1 Sign bit (0=pos/1=neg) 11 Exponent bits Excess-1023 representation More on next slides 52 fraction (significand or mantissa) bits Equiv. Decimal Range: 16 digits x 10 ±308 1 8 23 1 11 52 S Exp. Fraction S Exp. Fraction

7 Exponent Representation Exponent needs its own sign (+/-) Rather than using 2 s comp. system we use Excess-N representation Single-Precision uses Excess-127 Double-Precision uses Excess-1023 This representation allows FP numbers to be easily compared Let E = stored exponent code and E = true exponent value For single-precision: E = E + 127 2 1 => E = 1, E = 128 10 = 10000000 2 For double-precision: E = E + 1023 2-2 => E = -2, E = 1021 10 = 01111111101 2 2 s comp. E' (stored Exp.) Excess- 127-1 1111 1111 +128-2 1111 1110 +127-128 1000 0000 1 +127 0111 1111 0 +126 0111 1110-1 +1 0000 0001-126 0 0000 0000-127 Comparison of 2 s comp. & Excess-N Q: Why don t we use Excess-N more to represent negative # s

8 Exponent Representation FP formats reserved the exponent values of all 1 s and all 0 s for special purposes Thus, for singleprecision the range of exponents is -126 to + 127 E (range of 8-bits shown) E (=E -127) and special values 11111111 Reserved 11111110 E -127=+127 10000000 E -127=+1 01111111 E -127=0 01111110 E -127=-1 00000001 E -127=-126 00000000 Reserved

9 IEEE Exponent Special Values E Fraction Meaning All 0 s All 0 s 0 All 0 s Not all 0 s (any bit = 1 ) Denormalized (0.fraction x 2-126 ) All 1 s All 0 s Infinity All 1 s Not all 0 s (any bit = 1 ) NaN (Not A Number) - 0/0, 0*,SQRT(-x)

10 Single-Precision Examples 1 1 1000 0010 110 0110 0000 0000 0000 0000 130-127=3 = = -1.1100110 * 2 3-1110.011 * 2 0-14.375 2 +0.6875 = +0.1011-1 +127 = 126 = +1.011 * 2-1 0 0111 1110 011 0000 0000 0000 0000 0000

11 Floating Point vs. Fixed Point Single Precision (32-bits) Equivalent Decimal Range: 7 significant decimal digits * 10 ±38 Compare that to 32-bit signed integer where we can represent ±2 billion. How does a 32-bit float allow us to represent such a greater range? FP allows for range but sacrifices precision (can t represent all numbers in its range) Double Precision (64-bits) Equivalent Decimal Range: 16 significant decimal digits * 10 ±308

12 IEEE Shortened Format 12-bit format defined just for this class (doesn t really exist) 1 Sign Bit 5 Exponent bits (using Excess-15) Same reserved codes 6 Fraction (significand) bits 1 5-bits 6-bits S E F Sign Bit 0=pos. 1=neg. Exponent Excess-15 E = E+15 E = E - 15 Fraction 1.bbbbbb

13 Examples 1 1 10100 101101 2 20-15=5-1.101101 * 2 5 = -110110.1 * 2 0 = -110110.1 = -54.5 +21.75 = +10101.11 = +1.010111 * 2 4 4+15=19 0 10011 010111 3 1 01101 100000 4 +3.625 = +11.101 = = 13-15=-2-1.100000 * 2-2 -0.011 * 2 0-0.011 = -0.375 1+15=16 = +1.110100 * 2 1 0 10000 110100

14 Rounding Methods +213.125 = 1.1010101001*2 7 => Can t keep all fraction bits 4 Methods of Rounding (you are only responsible for the first 2) Round to Nearest Round towards 0 (Chopping) Round toward + (Round Up) Round toward - (Round Down) Normal rounding you learned in grade school. Round to the nearest representable number. If exactly halfway between, round to representable value w/ 0 in LSB. Round the representable value closest to but not greater in magnitude than the precise value. Equivalent to just dropping the extra bits. Round to the closest representable value greater than the number Round to the closest representable value less than the number

15 Number Line View Of Rounding Methods Green lines are numbers that fall between two representable values (dots) and thus need to be rounded Round to Nearest - 0 + -3.75 +5.8 Round to Zero - 0 + Round to +Infinity - 0 + Round to - Infinity - 0 +

16 Rounding Implementation There may be a large number of bits after the fraction To implement any of the methods we can keep only a subset of the extra bits after the fraction [hardware is finite] Guard bits: bits immediately after LSB of fraction (in this class we will usually keep only 1 guard bit) Round bit: bit to the right of the guard bits Sticky bit: Logical OR of all other bits after G & R bits 1.01001010010 x 2 4 1.010010101 x 2 4 GRS We can perform rounding to a 6-bit fraction using just these 3 bits. Logical OR (output is 1 if any input is 1, 0 otherwise

17 Rounding to Nearest Method Same idea as rounding in decimal.51 and up, round up,.49 and down, round down,.50 exactly we round up in decimal In this method we treat it differently If precise value is exactly half way between 2 representable values, round towards the number with 0 in the LSB

18 Round to Nearest Method Round to the closest representable value If precise value is exactly half way between 2 representable value, round towards the number with 0 in the LSB 1.11111011010 x 2 4 1.111110111 x 2 4 GRS Round Up +1.111110 x 2 4 +1.111110111 x 2 4 +1.111111 x 2 4 Precise value will be rounded to one of the representable value it lies between. In this case, round up because precise value is closer to the next higher respresentable values

19 Rounding to Nearest Method 3 Cases in binary FP: G = 1 & (R,S 0,0) => round fraction up (add 1 to fraction) may require a re-normalization G = 1 & (R,S = 0,0) => round to the closest fraction value with a 0 in the LSB may require a re-normalization G = 0 => leave fraction alone (add 0 to fraction)

20 Round to Nearest 1.001100110 x 2 4 GRS GRS GRS 1.111111101 x 2 4 1.001101001 x 2 4 G = 1 & R,S 0,0 G = 1 & R,S 0,0 G = 0 Round up (fraction + 1) Round up (fraction + 1) Leave fraction 0 10011 001101 + 1.111111 x 2 4 0.000001 x 2 4 10.000000 x 2 4 0 10011 001101 1.000000 x 2 5 Requires renormalization 0 10100 000000

21 Round to Nearest In all these cases, the numbers are halfway between the 2 possible round values Thus, we round to the value w/ 0 in the LSB 1.001100100 x 2 4 GRS GRS GRS 1.111111100 x 2 4 1.001101100 x 2 4 G = 1 and R,S = 0 G = 1 and R,S = 0 G = 1 and R,S = 0 Rounding options are: 1.001100 or 1.001101 In this case, round down 0 10011 001100 Rounding options are: 1.111111 or 10.000000 In this case, round up + 1.111111 x 2 4 0.000001 x 2 4 10.000000 x 2 4 Rounding options are: 1.001101 or 1.001110 In this case, round up 0 10011 001110 1.000000 x 2 5 Requires renormalization 0 10100 000000

22 Round to 0 (Chopping) Simply drop the G,R,S bits and take fraction as is GRS GRS GRS 1.001100001 x 2 4 1.001101101 x 2 4 1.001100111 x 2 4 drop G,R,S bits drop G,R,S bits drop G,R,S bits 0 10011 001100 0 10011 001101 0 10011 001100

23 Important Warning For Programmers FP addition/subtraction is NOT associative Because of rounding / inability to precisely represent fractions, (a+b)+c a+(b+c) (small + LARGE) LARGE small + (LARGE LARGE) Why? Because of rounding and special values like Inf. (0.0001 + 98475) 98474 0.0001 + (98475-98474) 98475-98474 0.0001 + 1 1 1.0001 Another Example: 1 + 1.11 1*2 127 1.11 1*2 127