FLOATING POINT FORMATS

Similar documents
CS 265. Computer Architecture. Wei Lu, Ph.D., P.Eng.

CREATING FLOATING POINT VALUES IN MIL-STD-1750A 32 AND 48 BIT FORMATS: ISSUES AND ALGORITHMS

Numeric Encodings Prof. James L. Frankel Harvard University

Name Student Activity

Orifice Flow Meter

Science, Technology, Engineering & Maths (STEM) Calculator Guide Texas Instruments TI-Nspire CX Handheld (Operating System 4.4)

Floating-point Arithmetic. where you sum up the integer to the left of the decimal point and the fraction to the right.

CHAPTER V NUMBER SYSTEMS AND ARITHMETIC

CS/EE1012 INTRODUCTION TO COMPUTER ENGINEERING SPRING 2013 HOMEWORK I. Solve all homework and exam problems as shown in class and sample solutions

CS101 Lecture 04: Binary Arithmetic

IEEE Standard for Floating-Point Arithmetic: 754

±M R ±E, S M CHARACTERISTIC MANTISSA 1 k j

Introduction to Computers and Programming. Numeric Values

NAME: 1a. (10 pts.) Describe the characteristics of numbers for which this floating-point data type is well-suited. Give an example.

Description Hex M E V smallest value > largest denormalized negative infinity number with hex representation 3BB0 ---

COMP Overview of Tutorial #2

IBM 370 Basic Data Types

Floating Point Arithmetic

Rui Wang, Assistant professor Dept. of Information and Communication Tongji University.

Number Systems. Binary Numbers. Appendix. Decimal notation represents numbers as powers of 10, for example

CSCI 402: Computer Architectures. Arithmetic for Computers (3) Fengguang Song Department of Computer & Information Science IUPUI.

Comparing Linear and Exponential Expressions ALGEBRA NSPIRED: EXPONENTIAL FUNCTIONS

Chapter 3: Arithmetic for Computers

Digital Signal Processing Introduction to Finite-Precision Numerical Effects

Addition and Subtraction of. Rational Numbers Part 2. Name. Class. Student Activity. Open the TI-Nspire document Add_Sub_Rational_Numbers_Part2.tns.

Science, Technology, Engineering & Maths (STEM) Calculator Guide Texas Instruments TI-Nspire Handheld (Touchpad with Operating System 3.

TEACHER: CREATE PRACTICE QUIZ

INTEGER REPRESENTATIONS

Number Systems and Binary Arithmetic. Quantitative Analysis II Professor Bob Orr

Floating Point. The World is Not Just Integers. Programming languages support numbers with fraction

Chapter Three. Arithmetic

3 Data Storage 3.1. Foundations of Computer Science Cengage Learning

Number Systems Prof. Indranil Sen Gupta Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Number Representation

Number Systems Standard positional representation of numbers: An unsigned number with whole and fraction portions is represented as:

15213 Recitation 2: Floating Point

CHW 261: Logic Design

Binary representation of integer numbers Operations on bits

ECE232: Hardware Organization and Design

Number System (1A) Young Won Lim 7/7/10

CO212 Lecture 10: Arithmetic & Logical Unit

Implementation of IEEE754 Floating Point Multiplier

Chapter 03: Computer Arithmetic. Lesson 09: Arithmetic using floating point numbers

Objectives. Connecting with Computer Science 2

Computer Architecture and Organization. Chapter 2 Data Representation!

C Fast RTS Library User Guide (Rev 1.0)

Lecture 25: Other Number Systems

Module 2: Computer Arithmetic

Signed Multiplication Multiply the positives Negate result if signs of operand are different

COMP2121: Microprocessors and Interfacing. Number Systems

UNIT 7A Data Representation: Numbers and Text. Digital Data

Graphics calculator instructions

COMP2611: Computer Organization. Data Representation

Number Systems. Decimal numbers. Binary numbers. Chapter 1 <1> 8's column. 1000's column. 2's column. 4's column

Proof of Identities TEACHER NOTES MATH NSPIRED. Math Objectives. Vocabulary. About the Lesson. TI-Nspire Navigator System

Set Theory in Computer Science. Binary Numbers. Base 10 Number. What is a Number? = Binary Number Example

Homework 1 graded and returned in class today. Solutions posted online. Request regrades by next class period. Question 10 treated as extra credit

Repeating Decimals and Fractions

Digital Fundamentals

ENGG 1203 Tutorial. Computer Arithmetic (1) Computer Arithmetic (3) Computer Arithmetic (2) Convert the following decimal values to binary:

IEEE Standard 754 Floating Point Numbers

Numeric Variable Storage Pattern

Number Systems & Encoding

TI-84+ GC 3: Order of Operations, Additional Parentheses, Roots and Absolute Value

Administrivia. CMSC 216 Introduction to Computer Systems Lecture 24 Data Representation and Libraries. Representing characters DATA REPRESENTATION

TUTORIAL -1 COMPUTER ORGANISATION TIT-402

1. NUMBER SYSTEMS USED IN COMPUTING: THE BINARY NUMBER SYSTEM

Representation of Numbers and Arithmetic in Signal Processors

Machine Arithmetic 8/31/2007

Floating Point Numbers. Lecture 9 CAP

The type of all data used in a C (or C++) program must be specified

Floating-Point Data Representation and Manipulation 198:231 Introduction to Computer Organization Lecture 3

CS 261 Fall Floating-Point Numbers. Mike Lam, Professor.

3.5 Floating Point: Overview

TI-30Xa SOLAR School Edition

Binary Addition & Subtraction. Unsigned and Sign & Magnitude numbers

Graphics calculator instructions

M1 Computers and Data

CHAPTER 5: Representing Numerical Data

CS 261 Fall Floating-Point Numbers. Mike Lam, Professor.

l l l l l l l Base 2; each digit is 0 or 1 l Each bit in place i has value 2 i l Binary representation is used in computers

Embedded Target for TI C6000 DSP 2.0 Release Notes

Organisasi Sistem Komputer

ECE 30 Introduction to Computer Engineering

CSE 1320 INTERMEDIATE PROGRAMMING - OVERVIEW AND DATA TYPES

CMPSCI 145 MIDTERM #1 Solution Key. SPRING 2017 March 3, 2017 Professor William T. Verts

FLOATING POINT NUMBERS

CMPSCI 145 MIDTERM #2 SPRING 2017 April 7, 2017 Professor William T. Verts NAME PROBLEM SCORE POINTS GRAND TOTAL 100

Inf2C - Computer Systems Lecture 2 Data Representation

Recap from Last Time. CSE 2021: Computer Organization. It s All about Numbers! 5/12/2011. Text Pictures Video clips Audio

ECE 2020B Fundamentals of Digital Design Spring problems, 6 pages Exam Two Solutions 26 February 2014

Floating Point Representation. CS Summer 2008 Jonathan Kaldor

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

ECE331: Hardware Organization and Design

Floating-point representations

Floating-point representations

CSE 140 Homework One

TI-30Xa/30Xa Solar, English

Floating Point. EE 109 Unit 20. Floating Point Representation. Fixed Point

9/3/2015. Data Representation II. 2.4 Signed Integer Representation. 2.4 Signed Integer Representation

Mid-term Examination

Transcription:

APPENDIX O FLOATING POINT FORMATS Paragraph Title Page 1.0 Introduction... O-1 2.0 IEEE 754 32 Bit Single Precision Floating Point... O-2 3.0 IEEE 754 64 Bit Double Precision Floating Point... O-2 4.0 MIL STD 1750A 32 Bit Single Precision Floating Point... O-2 5.0 MIL STD 1750A 48 Bit Double Precision Floating Point... O-3 6.0 DEC 32 Bit Single Precision Floating Point... O-3 7.0 DEC 64 Bit Double Precision Floating Point... O-3 8.0 DEC 64 Bit G Double Precision Floating Point... O-4 9.0 IBM 32 Bit Single Precision Floating Point... O-4 10.0 IBM 64 Bit Double Precision Floating Point... O-4 11.0 TI (Texas Instruments) 32 Bit Single Precision Floating Point... O-5 12.0 TI (Texas Instruments) 40 Bit Extended Precision Floating Point... O-5 LIST OF TABLES Table O-1. Floating Point Formats... O-1 CHANGES TO THIS EDITION OF APPENDIX O Updates to this Appendix are in blue font. The blue font indicates either a completely new item or an update to a previously existing item. Paragraph 8.0 is entirely new.

This page intentionally left blank. ii

1.0 Introduction APPENDIX O FLOATING POINT FORMATS Table O-1 provides a summary of floating point formats. Details of each format are shown on the pages following the table. TABLE O-1. FLOATING POINT FORMATS Type Size Radix Sign Exponent Fraction Bias Formula IEEE_32 32 2 1 8 23 127 (-1 S )(1.F)(2 (E-127) ) IEEE_64 64 2 1 11 52 1023 (-1 S )(1.F)(2 (E-1023) ) 1750A_32 32 2 0 8 24 0 (0.F)(2 E ) 1750A_48 48 2 0 8 40 0 (0.F)(2 E ) DEC_32 32 2 1 8 23 128 (-1 S )(0.1F)(2 (E-128) ) DEC_64 64 2 1 8 55 128 (-1 S )(0.1F)(2 (E-128) ) DEC_64G 64 2 1 11 52 1024 (-1 S )(0.1F)(2 (E-1024) ) IBM_32 32 16 1 7 24 64 (-1 S )(0.F)(16 (E-64) ) IBM_64 64 16 1 7 56 64 (-1 S )(0.F)(16 (E-64) ) TI_32 32 2 1 8 24 0 ((-2) S + (0.F))(2 E ) TI_40 40 2 1 8 32 0 ((-2) S + (0.F))(2 E ) O-1

2.0 IEEE 754 32 Bit Single Precision Floating Point 1 2 9 10 32 2-1 2-23 Value = (-1 S )(1.F)(2 (E-127) ) Exponent = power of 2 with bias of 127 Fraction = F portion of 23 bit fraction 1.F 0: E = 0, F = 0 3.0 IEEE 754 64 Bit Double Precision Floating Point 1 2 12 13 64 2-1 2-52 Value = (-1 S )(1.F)(2 (E-1023) ) Exponent = power of 2 with bias of 1023 Fraction = F portion of 52 bit fraction 1.F 0: E = 0, F = 0 4.0 MIL STD 1750A 32 Bit Single Precision Floating Point S Fraction Exponent 1 2 24 25 32 2-1 2-23 Value = (0.F)(2 E ) S + Fraction = Normalized, 2's complement F portion of 24 bit fraction 0.F (Bit 2 MUST be set for positive, clear for negative) O-2

5.0 MIL STD 1750A 48 Bit Double Precision Floating Point S Fraction (MSW) Exponent Fraction 1 2 24 25 32 33 48 2-1 2-23 2-24 2-31 Value = (0.F)(2 E ) S + Fraction = Normalized, 2's complement F portion of 40 bit fraction 0.F (Bit 2 MUST be set for positive, clear for negative) 6.0 DEC 32 Bit Single Precision Floating Point 1 2 9 10 32 2-2 2-24 Value = (-1 S )(0.1F)(2 (E-128) ) Exponent = power of 2 with bias of 128 Fraction = F portion of 23 bit fraction 0.1F 0: S = 0 & F = 0 & E = 0 7.0 DEC 64 Bit Double Precision Floating Point 1 2 9 10 64 2-2 2-56 Value = (-1 S )(0.1F)(2 (E-128) ) Exponent = power of 2 with bias of 128 Fraction = F portion of 55 bit fraction 0.1F 0: S = 0 & F = 0 & E = 0 O-3

8.0 DEC 64 Bit G Double Precision Floating Point 1 2 12 13 64 2-2 2-53 Value = (-1 S )(0.1F)(2 (E-1024) ) Exponent = power of 2 with bias of 1024 Fraction = F portion of 52 bit fraction 0.1F 0: S = 0 & F = 0 & E = 0 9.0 IBM 32 Bit Single Precision Floating Point 1 2 8 9 32 2-1 2-24 Value = (-1 S )(0.F)(16 (E-64) ) Exponent = power of 16 with bias of 64 Fraction = Normalized F portion of 24 bit fraction 0.F (Bits 9-12 cannot be all zero) 10.0 IBM 64 Bit Double Precision Floating Point 1 2 8 9 64 2-1 2-56 Value = (-1 S )(0.F)(16 (E-64) ) Exponent = power of 16 with bias of 64 Fraction = Normalized F portion of 56 bit fraction 0.F (Bits 9-12 cannot be all zero) O-4

11.0 TI (Texas Instruments) 32 Bit Single Precision Floating Point Exponent S Fraction 1 8 9 10 32 2-1 2-23 Value = ((-2) S +(0.F))(2 E ) Fraction = 2's complement F portion of 24 bit fraction 1.F 0: E = -128 12.0 TI (Texas Instruments) 40 Bit Extended Precision Floating Point Exponent S Fraction 1 8 9 10 40 2-1 2-31 Value = ((-2) S +(0.F))(2 E ) Fraction = 2's complement F portion of 32 bit fraction 1.F 0: E = -128 **** END OF APPENDIX O **** O-5