Floating Point. EE 109 Unit 20. Floating Point Representation. Fixed Point

Similar documents
EE 109 Unit 20. IEEE 754 Floating Point Representation Floating Point Arithmetic

EE 109 Unit 19. IEEE 754 Floating Point Representation Floating Point Arithmetic

10.1. Unit 10. Signed Representation Systems Binary Arithmetic

Floating-Point Data Representation and Manipulation 198:231 Introduction to Computer Organization Lecture 3

Floating Point (with contributions from Dr. Bin Ren, William & Mary Computer Science)

Data Representation Floating Point

Data Representation Floating Point

Floating Point : Introduction to Computer Systems 4 th Lecture, May 25, Instructor: Brian Railing. Carnegie Mellon

Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition. Carnegie Mellon

Divide: Paper & Pencil

Foundations of Computer Systems

Floating Point. CSE 238/2038/2138: Systems Programming. Instructor: Fatma CORUT ERGİN. Slides adapted from Bryant & O Hallaron s slides

Chapter 03: Computer Arithmetic. Lesson 09: Arithmetic using floating point numbers

Floating point. Today! IEEE Floating Point Standard! Rounding! Floating Point Operations! Mathematical properties. Next time. !

Floating Point Puzzles. Lecture 3B Floating Point. IEEE Floating Point. Fractional Binary Numbers. Topics. IEEE Standard 754

Floating Point January 24, 2008

Chapter 2 Float Point Arithmetic. Real Numbers in Decimal Notation. Real Numbers in Decimal Notation

COMP2611: Computer Organization. Data Representation

Representing and Manipulating Floating Points

Today: Floating Point. Floating Point. Fractional Binary Numbers. Fractional binary numbers. bi bi 1 b2 b1 b0 b 1 b 2 b 3 b j

Floating Point. The World is Not Just Integers. Programming languages support numbers with fraction

Representing and Manipulating Floating Points

Systems I. Floating Point. Topics IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties

CS429: Computer Organization and Architecture

Floating Point Puzzles The course that gives CMU its Zip! Floating Point Jan 22, IEEE Floating Point. Fractional Binary Numbers.

Floating Point Numbers

Floating Point Numbers

Floating Point Puzzles. Lecture 3B Floating Point. IEEE Floating Point. Fractional Binary Numbers. Topics. IEEE Standard 754

Number Systems. Decimal numbers. Binary numbers. Chapter 1 <1> 8's column. 1000's column. 2's column. 4's column

Number Systems and Computer Arithmetic

Giving credit where credit is due

System Programming CISC 360. Floating Point September 16, 2008

Giving credit where credit is due

Floating Point. CSE 351 Autumn Instructor: Justin Hsia

Floating Point Arithmetic

3.5 Floating Point: Overview

Floating point. Today. IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties Next time.

Floating-point Arithmetic. where you sum up the integer to the left of the decimal point and the fraction to the right.

COMP Overview of Tutorial #2

Representing and Manipulating Floating Points. Jo, Heeseung

Computer Architecture Chapter 3. Fall 2005 Department of Computer Science Kent State University

Characters, Strings, and Floats

Chapter Three. Arithmetic

Numeric Encodings Prof. James L. Frankel Harvard University

ECE232: Hardware Organization and Design

Floating Point Arithmetic

Lecture 13: (Integer Multiplication and Division) FLOATING POINT NUMBERS

Floating Point Numbers. Lecture 9 CAP

Representing and Manipulating Floating Points. Computer Systems Laboratory Sungkyunkwan University

Floating Point. CSE 351 Autumn Instructor: Justin Hsia

Chapter 3: Arithmetic for Computers

Representing and Manipulating Floating Points

World Inside a Computer is Binary

Data Representation Floating Point

Integers and Floating Point

Floating Point. CSE 351 Autumn Instructor: Justin Hsia

CS 261 Fall Floating-Point Numbers. Mike Lam, Professor.

CO212 Lecture 10: Arithmetic & Logical Unit

MACHINE LEVEL REPRESENTATION OF DATA

CHAPTER 5: Representing Numerical Data

IT 1204 Section 2.0. Data Representation and Arithmetic. 2009, University of Colombo School of Computing 1

Organisasi Sistem Komputer

CS 261 Fall Floating-Point Numbers. Mike Lam, Professor.

Numerical computing. How computers store real numbers and the problems that result

15213 Recitation 2: Floating Point

CS 265. Computer Architecture. Wei Lu, Ph.D., P.Eng.

Scientific Computing. Error Analysis

Introduction to Computer Systems Recitation 2 May 29, Marjorie Carlson Aditya Gupta Shailin Desai

CSC201, SECTION 002, Fall 2000: Homework Assignment #2

Computer (Literacy) Skills. Number representations and memory. Lubomír Bulej KDSS MFF UK

C NUMERIC FORMATS. Overview. IEEE Single-Precision Floating-point Data Format. Figure C-0. Table C-0. Listing C-0.

Computer Organization: A Programmer's Perspective

The ALU consists of combinational logic. Processes all data in the CPU. ALL von Neuman machines have an ALU loop.

Floating Point Numbers

Inf2C - Computer Systems Lecture 2 Data Representation

Number Systems CHAPTER Positional Number Systems

CHAPTER V NUMBER SYSTEMS AND ARITHMETIC

Semester Transition Point. EE 109 Unit 11 Binary Arithmetic. Binary Arithmetic ARITHMETIC

Floating Point Numbers

CS367 Test 1 Review Guide

EE260: Logic Design, Spring n Integer multiplication. n Booth s algorithm. n Integer division. n Restoring, non-restoring

IEEE-754 floating-point

SIGNED AND UNSIGNED SYSTEMS

Real Numbers. range -- need to deal with larger range, not just [not just -2^(n-1) to +2^(n-1)-1] - large numbers - small numbers (fractions)

Topic Notes: Bits and Bytes and Numbers

Number Representations

Number Systems Standard positional representation of numbers: An unsigned number with whole and fraction portions is represented as:

Recap from Last Time. CSE 2021: Computer Organization. It s All about Numbers! 5/12/2011. Text Pictures Video clips Audio

CS 33. Data Representation (Part 3) CS33 Intro to Computer Systems VIII 1 Copyright 2018 Thomas W. Doeppner. All rights reserved.

Introduction to Computers and Programming. Numeric Values

COSC 243. Data Representation 3. Lecture 3 - Data Representation 3 1. COSC 243 (Computer Architecture)

Topic Notes: Bits and Bytes and Numbers

Exponential Numbers ID1050 Quantitative & Qualitative Reasoning

Module 2: Computer Arithmetic

ecture 25 Floating Point Friedland and Weaver Computer Science 61C Spring 2017 March 17th, 2017

The type of all data used in a C (or C++) program must be specified

COMP2121: Microprocessors and Interfacing. Number Systems

IEEE Floating Point Numbers Overview

Written Homework 3. Floating-Point Example (1/2)

The course that gives CMU its Zip! Floating Point Arithmetic Feb 17, 2000

Transcription:

2.1 Floating Point 2.2 EE 19 Unit 2 IEEE 754 Floating Point Representation Floating Point Arithmetic Used to represent very numbers (fractions) and very numbers Avogadro s Number: +6.247 * 1 23 Planck s Constant: +6.6254 * 1-27 Note: 32 or 64-bit integers can t represent this range Floating Point representation is used in HLL s like C by declaring variables as float or double Fixed Point 2.3 Floating Point Representation 2.4 Unsigned and 2 s complement fall under a category of representations called The radix point is to be in a fixed location for all numbers [Note: we could represent fractions by implicitly assuming the binary point is at the left A variable just stores bits you can assume the binary point is anywhere you like] Integers: 11111. (binary point to right of LSB) For 32-bits, unsigned range is to ~4 billion s:.11111 (binary point to left of MSB) Bit storage Fixed point Rep. Range [ to 1) Main point: By fixing the radix point, we the range of numbers that can be represented Floating point allows the radix point to be in a different location for each value Similar to used with decimal numbers Floating Point representation uses the following form ±b.bbbb * 2 ±exp 3 Fields:,, (also called or significand) Overall Sign of #

2.5 2.6 Normalized FP Numbers IEEE Floating Point Formats Decimal Example +.754*1 15 is correct scientific notation Must have exactly one before decimal point: In binary the only significant digit is Thus normalized FP format is: FP numbers will always be before being stored in memory or a reg. The is actually not stored but assumed since we always will store normalized numbers If HW calculates a result of.111*2 5 it must normalize to 1.11*2 2 before storing Single Precision (32-bit format) Sign bit (=pos/1=neg) Exponent bits representation More on next slides fraction (significand or mantissa) bits Equiv. Decimal Range: 7 digits x 1 ±38 1 8 23 S Exp. Double Precision (64-bit format) Sign bit (=pos/1=neg) Exponent bits Excess-123 representation More on next slides fraction (significandor mantissa) bits Equiv. Decimal Range: 16 digits x 1 ±38 1 11 52 S Exp. 2.7 2.8 Exponent Representation Exponent Representation Exponent needs its own sign (+/-) Rather than using 2 s comp. system we use Excess-N representation Single-Precision uses Excess-127 Double-Precision uses Excess-123 This representation allows FP numbers to be easily compared Let E = stored exponent code and E = true exponent value For single-precision: E = E + 127 2 1 => E = 1, E = 128 1 = 1 2 For double-precision: E = E + 123 2-2 => E = -2, E = 121 1 = 111111111 2 2 s comp. E' (stored Exp.) Excess- 127-1 1111 1111 +128-2 1111 111 +127-128 1 1 +127 111 1111 +126 111 111-1 +1 1-126 -127 Comparison of 2 s comp. & Excess-N Q: Why don t we use Excess-N more to represent negative # s FP formats reserved the exponent values of all 1 s and all s for special purposes Thus, for singleprecision the range of exponents is -126 to + 127 E (range of 8-bits shown) 11111111 1111111 1 1111111 111111 1 E (=E -127) and special values

2.9 2.1 IEEE Exponent Special Values Single-Precision Examples E Meaning 1 1 1 1 11 11 2 +.6875 = +.111 2.11 2.12 Floating Point vs. Fixed Point Single Precision (32-bits) Equivalent Decimal Range: 7 significant decimal digits * 1 ±38 Compare that to 32-bit signed integer where we can represent ±2 billion. How does a 32-bit float allow us to represent such a greater range? FP allows for but sacrifices (can t represent in its range) Double Precision (64-bits) Equivalent Decimal Range: 16 significant decimal digits * 1 ±38 IEEE Shortened Format 12-bit format defined just for this class (doesn t really exist) 1 Sign Bit 5 Exponent bits (using ) Same reserved codes 6 (significand) bits 1 5-bits 6-bits S E F Sign Bit =pos. 1=neg. Exponent Excess-15 E = E+15 E = E - 15 1.bbbbbb

2.13 2.14 Examples Rounding Methods 1 1 11 1111 2 +21.75 = +111.11 +213.125 = 1.11111*2 7 => Can t keep all fraction bits 4 Methods of Rounding (you are only responsible for the first 2) Round to Normal rounding you learned in grade school. Round to the nearest representable number. If exactly halfway between, round to representable value w/ in LSB. 3 1 111 1 4 +3.625 = +11.11 Round towards ( ) Round the representable value closest to but not greater in magnitude than the precise value. Equivalent to just dropping the extra bits. Round toward (Round Up) Round to the closest representable value greater than the number Round toward (Round Down) Round to the closest representable value less than the number 2.15 2.16 Number Line View Of Rounding Methods Rounding Implementation Round to Nearest Round to Zero - - Green lines are numbers that fall between two representable values (dots) and thus need to be rounded + -3.75 +5.8 + There may be a large number of bits after the fraction To implement any of the methods we can keep only a subset of the extra bits after the fraction [hardware is finite] bits: bits immediately after LSB of fraction (in this class we will usually keep only bit) bit: bit to the right of the guard bits bit: of all other bits after G & R bits Round to +Infinity Round to - Infinity - - + + 1.1111 x 2 4 1.11 x 2 4 GRS We can perform rounding to a 6-bit fraction using just these 3 bits.

Rounding to Nearest Method 2.17 Round to Nearest Method 2.18 Same idea as rounding in decimal.51 and up, round up,.49 and down, round down,.5 exactly we round up in decimal In this method we treat it differently If precise value is exactly half way between 2 representable values, round towards the number with in the LSB Round to the closest representable value If precise value is exactly half way between 2 representable value, round towards the number with in the LSB 1.11111111 x 2 4 1.11111111 x 2 4 GRS Round Up +1.11111 x 2 4 +1.11111111 x 2 4 +1.111111 x 2 4 Precise value will be rounded to one of the representable value it lies between. In this case, round up because precise value is closer to the next higher respresentable values Rounding to Nearest Method 2.19 Round to Nearest 2.2 3 Cases in binary FP: G = => round fraction up (add 1 to fraction) may require a re-normalization G = => round to the closest fraction value with a in the LSB may require a re-normalization G = => leave fraction alone (add to fraction) GRS GRS GRS 1.1111 x 2 4 1.11111111 x 2 4 1.1111 x 2 4 G =

Round to Nearest 2.21 Round to (Chopping) 2.22 In all these cases, the numbers are halfway between the 2 possible round values Thus, we round to the value w/ in the LSB GRS GRS GRS 1.111 x 2 4 1.1111111 x 2 4 1.1111 x 2 4 Simply drop the G,R,S bits and take fraction as is GRS GRS GRS 1.111 x 2 4 1.11111 x 2 4 1.11111 x 2 4 111 111 111 Important Warning For Programmers FP addition/subtraction is NOT Because of rounding / inability to precisely represent fractions, (a+b)+c a+(b+c) 2.23 2.24 (small + LARGE) LARGE small + (LARGE LARGE) Why? Because of and special values like Inf. (.1 + 98475) 98474.1 + (98475-98474) 98475-98474.1 + 1 1 1.1 Another Example: 1 + 1.11 1*2 127 1.11 1*2 127 USING INTEGERS TO EMULATE DECIMAL/FRACTION ARITHMETIC

Option 1 for Performing FP 2.25 Option 2 for Performing FP 2.26 Problem: Suppose you need to add, subtract and compare numbers representing FP numbers Ex. amounts of money (dollars and cents.) Option 1: Use variables Pro: Easy to implement and easy to work with Con: Some processors like Arduino don't have of FP and so they would need to include a lot of code to FP operations Con: Numbers like $12.5 can be represented exactly but most numbers are due to rounding of fractions that can be can't be represented in a finite number of bits. Example.1 decimal can't be represented exactly in binary float x, y, z; if (x > y) z = z + x; Option 1: Just use 'float' or 'double' variables in your code Option 2: Split the amounts into two variables, one for the dollars and one for the cents. $12.53 -> 12 and 53 Pro: Everything done by fixed-point operations, no FP. Cons: All operations require at least two fixed point operations (though this is probably still faster than the code the compiler would include in Option 1) /* add two numbers */ z_cents = x_cents + y_cents; z_dollars = x_dollars + y_dollars; if (z_cents > 1) { z_dollars++; z_cents -= 1; } Option 2: Use 'ints' to emulate FP Option 3 for Performing FP 2.27 Emulating Floating Point 2.28 Option 3: Use a fixedpoint variable by multiplying the amounts by 1(e.g. $12.53 -> 1253) Pro: All adds, subtracts, and compares are done as a single fixedpoint operation Con: Must convert to this form on input and back to dollars and cents on output. int x_amount = x_dollars*1+x_cents; int y_amount = y_dollars*1+y_cents; int z_amount; if (x_amount > y_amount) z_amount += z_amount + x_amount; int z_dollars = z_amount / 1; int z_cents = z_amount % 1; Option 3: Use single 'int' values that contain the combined values of integer and decimal Let's examine option 3 a bit more Suppose we have separate variables storing the integer and fraction part of a number separately If we just added integer portions we'd get 18 If we account for the fractions into the inputs of our operation we would get 19.25 (which could then be truncated to 19) We would prefer this more precise answer that includes the fractional parts in our inputs Desired Decimal Operation (separate integer / fraction): + Integer 12. 75 6. 5

Example and Procedure 2.29 Another Example 2.3 Imagine we took our numbers and multiplied them each by 1, performed the operation, then divided by 1 1*(12.75+6.5) = 1275 + 65 = 1925 => 1925/1 = 19.25 => 19 Procedure Assemble pieces into variable integer to the, then in fraction bits Peformdesired int + frac by back Desired Decimal Operation (separate integer / fraction): 12. 75 + 6. 5 + Integer Shift Integers & Add in fractions:... Disassemble integer and fraction results (if only integer is desired, simply shift right to drop fraction).. Or just integer Integer + Suppose we want to convert from "dozens" to number of individual items (i.e. 1.25 dozen = 15 items) Simple formula: 12*dozens = individual items Suppose we only support fractions: ¼, ½, ¾ represented as follows: Decimal View Integer 3. 25 3. 5 3. 75 Binary View Integer 11. 1 11. 1 11. 11 Another Example 2.31 Procedure Assemble int+ fracpieces into 1 variable Shift integer to the left, then add/or in fraction bits Peform desired arithmetic operation Disassemble int+ fracby shifting back Decimal View i f 3. 25 Binary View i f 11. 1 1 2 3 i = 12. 25 11. 1 i 13. 111. i 156. 1111. i = 39. 1111.