Roundoff Errors and Computer Arithmetic

Save this PDF as:

Size: px
Start display at page:

Transcription

1 Jim Lambers Math 105A Summer Session I Lecture 2 Notes These notes correspond to Section 1.2 in the text. Roundoff Errors and Computer Arithmetic In computing the solution to any mathematical problem, there are many sources of error that can impair the accuracy of the computed solution. The study of these sources of error is called error analysis, which will be discussed in the next lecture. In this lecture, we will focus on one type of error that occurs in all computation, whether performed by hand or on a computer: roundoff error. This error is due to the fact that in computation, real numbers can only be represented using a finite number of digits. In general, it is not possible to represent real numbers exactly with this limitation, and therefore they must be approximated by real numbers that can be represented using a fixed number of digits, which is called the precision. Furthermore, as we shall see, arithmetic operations applied to numbers that can be represented exactly using a given precision do not necessarily produce a result that can be represented using the same precision. It follows that if a fixed precision is used, then every arithmetic operation introduces error into a computation. Given that scientific computations can have several sources of error, one would think that it would be foolish to compound the problem by performing arithmetic using fixed precision. As we shall see in this lecture, however, using a fixed precision is far more practical than other options, and, as long as computations are performed carefully, sufficient accuracy can still be achieved. Floating-Point Numbers We now describe a typical system for representing real numbers on a computer. Definition (Floating-point Number System) Given integers β > 1, p 1, L, and U L, a floating-point number system F is defined to be the set of all real numbers of the form x = ±mβ E. The number m is the mantissa of x, and has the form p 1 m = d j β j, j=0 where each digit d j, j = 0,..., p 1 is an integer satisfying 0 d i β 1. The number E is called the exponent or the characteristic of x, and it is an integer satisfying L E U. The integer p is called the precision of F, and β is called the base of F. 1

2 The term floating-point comes from the fact that as a number x F is multiplied by or divided by a power of β, the mantissa does not change, only the exponent. As a result, the decimal point shifts, or floats, to account for the changing exponent. Nearly all computers use a binary floating-point system, in which β = 2. In fact, most computers conform to the IEEE standard for floating-point arithmetic. The standard specifies, among other things, how floating-point numbers are to be represented in memory. Two representations are given, one for single-precision and one for double-precision. Under the standard, single-precision floating-point numbers occupy 4 bytes in memory, with 23 bits used for the mantissa, 8 for the exponent, and one for the sign. IEEE double-precision floating-point numbers occupy eight bytes in memory, with 52 bits used for the mantissa, 11 for the exponent, and one for the sign. Properties of Floating-Point Systems A floating-point system F can only represent a finite subset of the real numbers. As such, it is important to know how large in magnitude a number can be and still be represented, at least approximately, by a number in F. Similarly, it is important to know how small in magnitude a number can be and still be represented by a nonzero number in F; if its magnitude is too small, then it is most accurately represented by zero. Definition (Underflow, Overflow) Let F be a floating-point number system. The smallest positive number in F is called the underflow level, and it has the value UFL = m min β L, where L is the smallest valid exponent and m min is the smallest mantissa. The largest positive number in F is called the overflow level, and it has the value OFL = β U+1 (1 β p ). The value of m min depends on whether floating-point numbers are normalized in F; this point will be discussed later. The overflow level is the value obtained by setting each digit in the mantissa to β 1 and using the largest possible value, U, for the exponent. It is important to note that the real numbers that can be represented in F are not equally spaced along the real line. Numbers having the same exponent are equally spaced, and the spacing between numbers in F decreases as their magnitude decreases. Normalization It is common to normalize floating-point numbers by specifying that the leading digit d 0 of the mantissa be nonzero. In a binary system, with β = 2, this implies that the leading digit is equal to 1, and therefore need not be stored. Therefore, in the IEEE floating-point standard, p = 24 for 2

3 single precision, and p = 53 for double precision, even though only 23 and 52 bits, respectively, are used to store mantissas. In addition to the benefit of gaining one additional bit of precision, normalization also ensures that each floating-point number has a unique representation. One drawback of normalization is that fewer numbers near zero can be represented exactly than if normalization is not used. Therefore, the IEEE standard provides for a practice called gradual underflow, in which the leading digit of the mantissa is allowed to be zero when the exponent is equal to L, thus allowing smaller values of the mantissa. In such a system, the number UFL is equal to β L p+1, whereas in a normalized system, UFL = β L. Rounding A number that can be represented exactly in a floating-point system is called a machine number. Since only finitely many real numbers are machine numbers, it is necessary to determine how nonmachine numbers are to be approximated by machine numbers. The process of choosing a machine number to approximate a non-machine number is called rounding, and the error introduced by such an approximation is called roundoff error. Given a real number x, the machine number obtained by rounding x is denoted by fl(x). In most floating-point systems, rounding is achieved by one of two strategies: chopping, or rounding toward zero, is the simplest strategy, in which the base-β expansion of a number is truncated after the first p digits. As a result, fl(x) is the unique machine number between 0 and x that is nearest to x. rounding to nearest, or rounding to even sets fl(x) to be the machine number that is closest to x in absolute value; if two numbers satisfy this property, then fl(x) is set to be the choice whose last digit is even. Machine Precision In error analysis, it is necessary to estimate error incurred in each step of a computation. As such, it is desirable to know an upper bound for the relative error introduced by rounding. This leads to the following definition. Definition (Machine Precision) Let F be a floating-point number system. The unit roundoff or machine precision, denoted by ɛ mach, is the real number that satisfies fl(x) x x ɛ mach for any real number x such that UFL < x < OFL. An intuitive definition of ɛ mach is that it is the smallest positive number such that fl ( 1 + ɛ mach ) > 1. 3

4 The value of ɛ mach depends on the rounding strategy that is used. If rounding toward zero is used, then ɛ mach = β 1 p, whereas if rounding to nearest is used, ɛ mach = 1 2 β1 p. It is important to avoid confusing ɛ mach with the underflow level UFL. The unit roundoff is determined by the number of digits in the mantissa, whereas the underflow level is determined by the range of allowed exponents. However, we do have the relation that 0 < UFL < ɛ mach. Floating-Point Arithmetic We now discuss the various issues that arise when performing floating-point arithmetic, or finiteprecision arithmetic, which approximates arithmetic operations on real numbers. When adding or subtracting floating-point numbers, it is necessary to shift one of the operands so that both operands have the same exponent, before adding or subtracting the mantissas. As a result, digits of precision are lost in the operand that is smaller in magnitude, and the result of the operation cannot be represented using a machine number. In fact, if x is the smaller operand and y is the larger operand, and x < y ɛ mach, then the result of the operation will simply be y (or y, if y is to be subtracted from x), since the entire value of x is lost in rounding the result. In multiplication or division, the operands need not be shifted, but the mantissas, when multiplied or divided, cannot necessarily be represented using only p digits of precision. The product of two mantissas requires 2p digits to be represented exactly, while the quotient of two mantissas could conceivably require infinitely many digits. Furthermore, overflow or underflow may occur depending on the exponents of the operands, since their sum or difference may lie outside of the interval [L, U]. Because floating-point arithmetic operations are not exact, they do not follow all of the laws of real arithmetic. In particular, floating-point arithmetic is not associative; i.e., x+(y+z) (x+y)+z in floating-point arithmetic. Absolute Error and Relative Error Now that we have been introduced to some specific errors that can occur during computation, we introduce useful terminology for discussing such errors. Suppose that a real number ŷ is an approximation to some real number y. For instance, ŷ may be the closest number to y that can be represented using finite precision, or ŷ may be the result of a sequence of arithmetic operations performed using finite-precision arithmetic, where y is the result of the same operations performed using exact arithmetic. Definition (Absolute Error, Relative Error) Let ŷ be a real number that is an approximation to the real number y. The absolute error in ŷ is E abs = ŷ y. 4

5 The relative error in ŷ is provided that y is nonzero. E rel = ŷ y, y The absolute error is the most natural measure of the accuracy of an approximation, but it can be misleading. Even if the absolute error is small in magnitude, the approximation may still be grossly inaccurate if the exact value y is even smaller in magnitude. For this reason, it is preferable to measure accuracy in terms of the relative error. The magnitude of the relative error in ŷ can be interpreted as a percentage of y. For example, if the relative error is greater than 1 in magnitude, then ŷ can be considered completely erroneous, since the error is larger in magnitude as the exact value. Another useful interpretation of the relative error concerns significant digits, which are all digits excluding leading zeros. Specifically, if the relative error is at most β p, where β is an integer greater than 1, then the representation of ŷ in base β has at least p correct significant digits. It should be noted that the absolute error and relative error are often defined using absolute value; that is, E abs = ŷ y, E rel = ŷ y y. This definition is preferable when one is only interested in the magnitude of the error, which is often the case. If the sign, or direction, of the error is also of interest, then the first definition must be used. Cancellation Subtraction of floating-point numbers presents a unique difficulty, in addition to the rounding error previously discussed. If the operands, after shifting exponents as needed, have leading digits in common, then these digits cancel and the first digit in which the operands do not match becomes the leading digit. However, since each operand is represented using only p digits, it follows that the result contains only p m correct digits, where m is the number of leading digits that cancel. In an extreme case, if the two operands differ by less than ɛ mach, then the result contains no correct digits; it consists entirely of roundoff error from previous computations. This phenomenon is known as catastrophic cancellation. Because of the highly detrimental effect of this cancellation, it is important to ensure that no steps in a computation compute small values from relatively large operands. Often, computations can be rearranged to avoid this risky practice. Other Arithmetic Systems It is natural to ask whether it is feasible to use variable-precision arithmetic, in which the precision is automatically increased to represent results of operations exactly. Variable-precision arithmetic 5

6 has been implemented in some software environments, such as Maple. Unfortunately, the increased accuracy is more than offset by the loss of efficiency. Not only are arithmetic operations more expensive due to the increased precision, but they are implemented in software, rather than hardware, because hardware only performs arithmetic in fixed precision. As a result, variable-precision arithmetic does not enjoy widespread use. To aid in error estimation, some arithmetic systems perform interval analysis on arithmetic operations that are implemented using either fixed or variable precision. In interval analysis, the result of each arithmetic operation is represented not as a single floating-point number but as an interval [a, b] such that the result that would be obtained using exact arithmetic lies within the interval [a, b]. This interval is used to determine the appropriate interval for future computations that use the associated result as an operand. In the case of variable-precision arithmetic, interval analysis can increase efficiency by providing a guide to precision adjustment: if the interval becomes too large, then precision should be increased, perhaps repeating computations in order to ensure sufficient accuracy. 6

Floating-Point Numbers in Digital Computers

POLYTECHNIC UNIVERSITY Department of Computer and Information Science Floating-Point Numbers in Digital Computers K. Ming Leung Abstract: We explain how floating-point numbers are represented and stored

Computational Methods. Sources of Errors

Computational Methods Sources of Errors Manfred Huber 2011 1 Numerical Analysis / Scientific Computing Many problems in Science and Engineering can not be solved analytically on a computer Numeric solutions

Section 1.4 Mathematics on the Computer: Floating Point Arithmetic

Section 1.4 Mathematics on the Computer: Floating Point Arithmetic Key terms Floating point arithmetic IEE Standard Mantissa Exponent Roundoff error Pitfalls of floating point arithmetic Structuring computations

CS321. Introduction to Numerical Methods

CS31 Introduction to Numerical Methods Lecture 1 Number Representations and Errors Professor Jun Zhang Department of Computer Science University of Kentucky Lexington, KY 40506 0633 August 5, 017 Number

2 Computation with Floating-Point Numbers

2 Computation with Floating-Point Numbers 2.1 Floating-Point Representation The notion of real numbers in mathematics is convenient for hand computations and formula manipulations. However, real numbers

Floating Point Representation. CS Summer 2008 Jonathan Kaldor

Floating Point Representation CS3220 - Summer 2008 Jonathan Kaldor Floating Point Numbers Infinite supply of real numbers Requires infinite space to represent certain numbers We need to be able to represent

Floating Point Representation in Computers

Floating Point Representation in Computers Floating Point Numbers - What are they? Floating Point Representation Floating Point Operations Where Things can go wrong What are Floating Point Numbers? Any

1.2 Round-off Errors and Computer Arithmetic

1.2 Round-off Errors and Computer Arithmetic 1 In a computer model, a memory storage unit word is used to store a number. A word has only a finite number of bits. These facts imply: 1. Only a small set

Floating-point representation

Lecture 3-4: Floating-point representation and arithmetic Floating-point representation The notion of real numbers in mathematics is convenient for hand computations and formula manipulations. However,

Chapter 03: Computer Arithmetic. Lesson 09: Arithmetic using floating point numbers

Chapter 03: Computer Arithmetic Lesson 09: Arithmetic using floating point numbers Objective To understand arithmetic operations in case of floating point numbers 2 Multiplication of Floating Point Numbers

Classes of Real Numbers 1/2. The Real Line

Classes of Real Numbers All real numbers can be represented by a line: 1/2 π 1 0 1 2 3 4 real numbers The Real Line { integers rational numbers non-integral fractions irrational numbers Rational numbers

Most nonzero floating-point numbers are normalized. This means they can be expressed as. x = ±(1 + f) 2 e. 0 f < 1

Floating-Point Arithmetic Numerical Analysis uses floating-point arithmetic, but it is just one tool in numerical computation. There is an impression that floating point arithmetic is unpredictable and

Computer Representation of Numbers and Computer Arithmetic

Computer Representation of Numbers and Computer Arithmetic January 17, 2017 Contents 1 Binary numbers 6 2 Memory 9 3 Characters 12 1 c A. Sandu, 1998-2015. Distribution outside classroom not permitted.

Floating-Point Arithmetic

Floating-Point Arithmetic ECS30 Winter 207 January 27, 207 Floating point numbers Floating-point representation of numbers (scientific notation) has four components, for example, 3.46 0 sign significand

Independent Representation

Independent Representation 1 Integer magnitude + 1 2 3 4 5 6 7 8 9 0 Float sign sign of exponent mantissa On standard 32 bit machines: INT_MAX = 2147483647 which gives 10 digits of precision, (i.e. the

Table : IEEE Single Format ± a a 2 a 3 :::a 8 b b 2 b 3 :::b 23 If exponent bitstring a :::a 8 is Then numerical value represented is ( ) 2 = (

Floating Point Numbers in Java by Michael L. Overton Virtually all modern computers follow the IEEE 2 floating point standard in their representation of floating point numbers. The Java programming language

Computational Mathematics: Models, Methods and Analysis. Zhilin Li

Computational Mathematics: Models, Methods and Analysis Zhilin Li Chapter 1 Introduction Why is this course important (motivations)? What is the role of this class in the problem solving process using

Lecture Notes: Floating-Point Numbers

Lecture Notes: Floating-Point Numbers CS227-Scientific Computing September 8, 2010 What this Lecture is About How computers represent numbers How this affects the accuracy of computation Positional Number

Module 2: Computer Arithmetic

Module 2: Computer Arithmetic 1 B O O K : C O M P U T E R O R G A N I Z A T I O N A N D D E S I G N, 3 E D, D A V I D L. P A T T E R S O N A N D J O H N L. H A N N E S S Y, M O R G A N K A U F M A N N

Floating Point. The World is Not Just Integers. Programming languages support numbers with fraction

1 Floating Point The World is Not Just Integers Programming languages support numbers with fraction Called floating-point numbers Examples: 3.14159265 (π) 2.71828 (e) 0.000000001 or 1.0 10 9 (seconds in

Floating Point Arithmetic

Floating Point Arithmetic CS 365 Floating-Point What can be represented in N bits? Unsigned 0 to 2 N 2s Complement -2 N-1 to 2 N-1-1 But, what about? very large numbers? 9,349,398,989,787,762,244,859,087,678

Floating Point Considerations

Chapter 6 Floating Point Considerations In the early days of computing, floating point arithmetic capability was found only in mainframes and supercomputers. Although many microprocessors designed in the

4 Operations On Data 4.1. Foundations of Computer Science Cengage Learning

4 Operations On Data 4.1 Foundations of Computer Science Cengage Learning Objectives After studying this chapter, the student should be able to: List the three categories of operations performed on data.

Scientific Computing. Error Analysis

ECE257 Numerical Methods and Scientific Computing Error Analysis Today s s class: Introduction to error analysis Approximations Round-Off Errors Introduction Error is the difference between the exact solution

Floating Point Numbers

Floating Point Numbers Summer 8 Fractional numbers Fractional numbers fixed point Floating point numbers the IEEE 7 floating point standard Floating point operations Rounding modes CMPE Summer 8 Slides

Floating-point numbers. Phys 420/580 Lecture 6

Floating-point numbers Phys 420/580 Lecture 6 Random walk CA Activate a single cell at site i = 0 For all subsequent times steps, let the active site wander to i := i ± 1 with equal probability Random

CS101 Lecture 04: Binary Arithmetic

CS101 Lecture 04: Binary Arithmetic Binary Number Addition Two s complement encoding Briefly: real number representation Aaron Stevens (azs@bu.edu) 25 January 2013 What You ll Learn Today Counting in binary

Floating-Point Data Representation and Manipulation 198:231 Introduction to Computer Organization Lecture 3

Floating-Point Data Representation and Manipulation 198:231 Introduction to Computer Organization Instructor: Nicole Hynes nicole.hynes@rutgers.edu 1 Fixed Point Numbers Fixed point number: integer part

1. NUMBER SYSTEMS USED IN COMPUTING: THE BINARY NUMBER SYSTEM

1. NUMBER SYSTEMS USED IN COMPUTING: THE BINARY NUMBER SYSTEM 1.1 Introduction Given that digital logic and memory devices are based on two electrical states (on and off), it is natural to use a number

Floating point. Today! IEEE Floating Point Standard! Rounding! Floating Point Operations! Mathematical properties. Next time. !

Floating point Today! IEEE Floating Point Standard! Rounding! Floating Point Operations! Mathematical properties Next time! The machine model Chris Riesbeck, Fall 2011 Checkpoint IEEE Floating point Floating

The Perils of Floating Point

The Perils of Floating Point by Bruce M. Bush Copyright (c) 1996 Lahey Computer Systems, Inc. Permission to copy is granted with acknowledgement of the source. Many great engineering and scientific advances

AMTH142 Lecture 10. Scilab Graphs Floating Point Arithmetic

AMTH142 Lecture 1 Scilab Graphs Floating Point Arithmetic April 2, 27 Contents 1.1 Graphs in Scilab......................... 2 1.1.1 Simple Graphs...................... 2 1.1.2 Line Styles........................

Number Systems and Computer Arithmetic

Number Systems and Computer Arithmetic Counting to four billion two fingers at a time What do all those bits mean now? bits (011011011100010...01) instruction R-format I-format... integer data number text

Inf2C - Computer Systems Lecture 2 Data Representation

Inf2C - Computer Systems Lecture 2 Data Representation Boris Grot School of Informatics University of Edinburgh Last lecture Moore s law Types of computer systems Computer components Computer system stack

CHAPTER 5 Computer Arithmetic and Round-Off Errors

CHAPTER 5 Computer Arithmetic and Round-Off Errors In the two previous chapters we have seen how numbers can be represented in the binary numeral system and how this is the basis for representing numbers

Logic, Words, and Integers

Computer Science 52 Logic, Words, and Integers 1 Words and Data The basic unit of information in a computer is the bit; it is simply a quantity that takes one of two values, 0 or 1. A sequence of k bits

Computer Arithmetic Ch 8

Computer Arithmetic Ch 8 ALU Integer Representation Integer Arithmetic Floating-Point Representation Floating-Point Arithmetic 1 Arithmetic Logical Unit (ALU) (2) Does all work in CPU (aritmeettis-looginen

Floating Point Puzzles The course that gives CMU its Zip! Floating Point Jan 22, IEEE Floating Point. Fractional Binary Numbers.

class04.ppt 15-213 The course that gives CMU its Zip! Topics Floating Point Jan 22, 2004 IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties Floating Point Puzzles For

CHAPTER 2 Data Representation in Computer Systems

CHAPTER 2 Data Representation in Computer Systems 2.1 Introduction 37 2.2 Positional Numbering Systems 38 2.3 Decimal to Binary Conversions 38 2.3.1 Converting Unsigned Whole Numbers 39 2.3.2 Converting

Floating Point Arithmetic

Floating Point Arithmetic Floating point numbers are frequently used in many applications. Implementation of arithmetic units such as adder, multiplier, etc for Floating point numbers are more complex

4 Operations On Data 4.1. Foundations of Computer Science Cengage Learning

4 Operations On Data 4.1 Foundations of Computer Science Cengage Learning Objectives After studying this chapter, the student should be able to: List the three categories of operations performed on data.

CHAPTER 2 Data Representation in Computer Systems

CHAPTER 2 Data Representation in Computer Systems 2.1 Introduction 37 2.2 Positional Numbering Systems 38 2.3 Decimal to Binary Conversions 38 2.3.1 Converting Unsigned Whole Numbers 39 2.3.2 Converting

Data Representation Floating Point

Data Representation Floating Point CSCI 2400 / ECE 3217: Computer Architecture Instructor: David Ferry Slides adapted from Bryant & O Hallaron s slides via Jason Fritts Today: Floating Point Background:

What Every Programmer Should Know About Floating-Point Arithmetic

What Every Programmer Should Know About Floating-Point Arithmetic Last updated: October 15, 2015 Contents 1 Why don t my numbers add up? 3 2 Basic Answers 3 2.1 Why don t my numbers, like 0.1 + 0.2 add

CO Computer Architecture and Programming Languages CAPL. Lecture 15

CO20-320241 Computer Architecture and Programming Languages CAPL Lecture 15 Dr. Kinga Lipskoch Fall 2017 How to Compute a Binary Float Decimal fraction: 8.703125 Integral part: 8 1000 Fraction part: 0.703125

Characters, Strings, and Floats

Characters, Strings, and Floats CS 350: Computer Organization & Assembler Language Programming 9/6: pp.8,9; 9/28: Activity Q.6 A. Why? We need to represent textual characters in addition to numbers. Floating-point

COMP 122/L Lecture 2. Kyle Dewey

COMP 122/L Lecture 2 Kyle Dewey Outline Operations on binary values AND, OR, XOR, NOT Bit shifting (left, two forms of right) Addition Subtraction Twos complement Bitwise Operations Bitwise AND Similar

Chapter 2 Float Point Arithmetic. Real Numbers in Decimal Notation. Real Numbers in Decimal Notation

Chapter 2 Float Point Arithmetic Topics IEEE Floating Point Standard Fractional Binary Numbers Rounding Floating Point Operations Mathematical properties Real Numbers in Decimal Notation Representation

15213 Recitation 2: Floating Point

15213 Recitation 2: Floating Point 1 Introduction This handout will introduce and test your knowledge of the floating point representation of real numbers, as defined by the IEEE standard. This information

Truncation Errors. Applied Numerical Methods with MATLAB for Engineers and Scientists, 2nd ed., Steven C. Chapra, McGraw Hill, 2008, Ch. 4.

Chapter 4: Roundoff and Truncation Errors Applied Numerical Methods with MATLAB for Engineers and Scientists, 2nd ed., Steven C. Chapra, McGraw Hill, 2008, Ch. 4. 1 Outline Errors Accuracy and Precision

System Programming CISC 360. Floating Point September 16, 2008

System Programming CISC 360 Floating Point September 16, 2008 Topics IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties Powerpoint Lecture Notes for Computer Systems:

Computer Architecture and Organization

3-1 Chapter 3 - Arithmetic Computer Architecture and Organization Miles Murdocca and Vincent Heuring Chapter 3 Arithmetic 3-2 Chapter 3 - Arithmetic Chapter Contents 3.1 Fixed Point Addition and Subtraction

What Every Computer Scientist Should Know About Floating Point Arithmetic

What Every Computer Scientist Should Know About Floating Point Arithmetic E Note This document is an edited reprint of the paper What Every Computer Scientist Should Know About Floating-Point Arithmetic,

C NUMERIC FORMATS. Overview. IEEE Single-Precision Floating-point Data Format. Figure C-0. Table C-0. Listing C-0.

C NUMERIC FORMATS Figure C-. Table C-. Listing C-. Overview The DSP supports the 32-bit single-precision floating-point data format defined in the IEEE Standard 754/854. In addition, the DSP supports an

Computer Numbers and their Precision, I Number Storage

Computer Numbers and their Precision, I Number Storage Learning goal: To understand how the ways computers store numbers lead to limited precision and how that introduces errors into calculations. Learning

Chapter 10 - Computer Arithmetic

Chapter 10 - Computer Arithmetic Luis Tarrataca luis.tarrataca@gmail.com CEFET-RJ L. Tarrataca Chapter 10 - Computer Arithmetic 1 / 126 1 Motivation 2 Arithmetic and Logic Unit 3 Integer representation

Floating-point Arithmetic. where you sum up the integer to the left of the decimal point and the fraction to the right.

Floating-point Arithmetic Reading: pp. 312-328 Floating-Point Representation Non-scientific floating point numbers: A non-integer can be represented as: 2 4 2 3 2 2 2 1 2 0.2-1 2-2 2-3 2-4 where you sum

Computational Methods CMSC/AMSC/MAPL 460. Representing numbers in floating point Error Analysis. Ramani Duraiswami, Dept. of Computer Science

Computational Methods CMSC/AMSC/MAPL 460 Representing numbers in floating point Error Analysis Ramani Duraiswami, Dept. of Computer Science Class Outline Recap of floating point representation Matlab demos

Systems I. Floating Point. Topics IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties

Systems I Floating Point Topics IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties IEEE Floating Point IEEE Standard 754 Established in 1985 as uniform standard for

The Sign consists of a single bit. If this bit is '1', then the number is negative. If this bit is '0', then the number is positive.

IEEE 754 Standard - Overview Frozen Content Modified by on 13-Sep-2017 Before discussing the actual WB_FPU - Wishbone Floating Point Unit peripheral in detail, it is worth spending some time to look at

IT 1204 Section 2.0. Data Representation and Arithmetic. 2009, University of Colombo School of Computing 1

IT 1204 Section 2.0 Data Representation and Arithmetic 2009, University of Colombo School of Computing 1 What is Analog and Digital The interpretation of an analog signal would correspond to a signal whose

UNIT - I: COMPUTER ARITHMETIC, REGISTER TRANSFER LANGUAGE & MICROOPERATIONS

UNIT - I: COMPUTER ARITHMETIC, REGISTER TRANSFER LANGUAGE & MICROOPERATIONS (09 periods) Computer Arithmetic: Data Representation, Fixed Point Representation, Floating Point Representation, Addition and

Practical Numerical Methods in Physics and Astronomy. Lecture 1 Intro & IEEE Variable Types and Arithmetic

Practical Numerical Methods in Physics and Astronomy Lecture 1 Intro & IEEE Variable Types and Arithmetic Pat Scott Department of Physics, McGill University January 16, 2013 Slides available from http://www.physics.mcgill.ca/

VHDL IMPLEMENTATION OF IEEE 754 FLOATING POINT UNIT

VHDL IMPLEMENTATION OF IEEE 754 FLOATING POINT UNIT Ms. Anjana Sasidharan Student, Vivekanandha College of Engineering for Women, Namakkal, Tamilnadu, India. Abstract IEEE-754 specifies interchange and

Giving credit where credit is due

CSCE 230J Computer Organization Floating Point Dr. Steve Goddard goddard@cse.unl.edu http://cse.unl.edu/~goddard/courses/csce230j Giving credit where credit is due Most of slides for this lecture are based

Chapter 2: Number Systems

Chapter 2: Number Systems Logic circuits are used to generate and transmit 1s and 0s to compute and convey information. This two-valued number system is called binary. As presented earlier, there are many

Calculations with Sig Figs

Calculations with Sig Figs When you make calculations using data with a specific level of uncertainty, it is important that you also report your answer with the appropriate level of uncertainty (i.e.,

Numerical Precision. Or, why my numbers aren t numbering right. 1 of 15

Numerical Precision Or, why my numbers aren t numbering right 1 of 15 What s the deal? Maybe you ve seen this #include int main() { float val = 3.6f; printf( %.20f \n, val); Print a float with

Floating point. Today. IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties Next time.

Floating point Today IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties Next time The machine model Fabián E. Bustamante, Spring 2010 IEEE Floating point Floating point

CS429: Computer Organization and Architecture

CS429: Computer Organization and Architecture Dr. Bill Young Department of Computer Sciences University of Texas at Austin Last updated: September 18, 2017 at 12:48 CS429 Slideset 4: 1 Topics of this Slideset

Numeric Precision 101

www.sas.com > Service and Support > Technical Support TS Home Intro to Services News and Info Contact TS Site Map FAQ Feedback TS-654 Numeric Precision 101 This paper is intended as a basic introduction

Representing and Manipulating Floating Points

Representing and Manipulating Floating Points Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu The Problem How to represent fractional values with

Imelda C. Go, South Carolina Department of Education, Columbia, SC

PO 082 Rounding in SAS : Preventing Numeric Representation Problems Imelda C. Go, South Carolina Department of Education, Columbia, SC ABSTRACT As SAS programmers, we come from a variety of backgrounds.

Lecture 5. Computer Arithmetic. Ch 9 [Stal10] Integer arithmetic Floating-point arithmetic

Lecture 5 Computer Arithmetic Ch 9 [Stal10] Integer arithmetic Floating-point arithmetic ALU ALU = Arithmetic Logic Unit (aritmeettis-looginen yksikkö) Actually performs operations on data Integer and

Numerical Methods in Physics. Lecture 1 Intro & IEEE Variable Types and Arithmetic

Variable types Numerical Methods in Physics Lecture 1 Intro & IEEE Variable Types and Arithmetic Pat Scott Department of Physics, Imperial College November 1, 2016 Slides available from http://astro.ic.ac.uk/pscott/

More Programming Constructs -- Introduction

More Programming Constructs -- Introduction We can now examine some additional programming concepts and constructs Chapter 5 focuses on: internal data representation conversions between one data type and

The Design and Implementation of a Rigorous. A Rigorous High Precision Floating Point Arithmetic. for Taylor Models

The and of a Rigorous High Precision Floating Point Arithmetic for Taylor Models Department of Physics, Michigan State University East Lansing, MI, 48824 4th International Workshop on Taylor Methods Boca

3.5 Floating Point: Overview

3.5 Floating Point: Overview Floating point (FP) numbers Scientific notation Decimal scientific notation Binary scientific notation IEEE 754 FP Standard Floating point representation inside a computer

Lesson 1: THE DECIMAL SYSTEM

Lesson 1: THE DECIMAL SYSTEM The word DECIMAL comes from a Latin word, which means "ten. The Decimal system uses the following ten digits to write a number: 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9. Each time

Signed umbers. Sign/Magnitude otation

Signed umbers So far we have discussed unsigned number representations. In particular, we have looked at the binary number system and shorthand methods in representing binary codes. With m binary digits,

1620 DATA PRQCESSING SYSTEM BULLETlli. AUTOMATIC FLOATING-POINT OPERATIONS (Special Feature)

1620 DATA PRQCESSING SYSTEM BULLETlli AUTOMATIC FLOATINGPOINT OPERATIONS (Special Feature) The Automatic FloatingPoint Operations special feature provides the IBM 1620 Data Processing System (Figure 1)

Arithmetic Logic Unit

Arithmetic Logic Unit A.R. Hurson Department of Computer Science Missouri University of Science & Technology A.R. Hurson 1 Arithmetic Logic Unit It is a functional bo designed to perform "basic" arithmetic,

Number Systems. Binary Numbers. Appendix. Decimal notation represents numbers as powers of 10, for example

Appendix F Number Systems Binary Numbers Decimal notation represents numbers as powers of 10, for example 1729 1 103 7 102 2 101 9 100 decimal = + + + There is no particular reason for the choice of 10,

Numerical methods in physics

Numerical methods in physics Marc Wagner Goethe-Universität Frankfurt am Main summer semester 2018 Version: April 9, 2018 1 Contents 1 Introduction 3 2 Representation of numbers in computers, roundoff

Co-processor Math Processor. Richa Upadhyay Prabhu. NMIMS s MPSTME February 9, 2016

8087 Math Processor Richa Upadhyay Prabhu NMIMS s MPSTME richa.upadhyay@nmims.edu February 9, 2016 Introduction Need of Math Processor: In application where fast calculation is required Also where there

Floating Point. CSC207 Fall 2017

Floating Point CSC207 Fall 2017 Ariane 5 Rocket Launch Ariane 5 rocket explosion In 1996, the European Space Agency s Ariane 5 rocket exploded 40 seconds after launch. During conversion of a 64-bit to

Floating-Point Arithmetic

Floating-Point Arithmetic if ((A + A) - A == A) { SelfDestruct() } Reading: Study Chapter 3. L12 Multiplication 1 Approximating Real Numbers on Computers Thus far, we ve entirely ignored one of the most

Floating Point Numbers. Lecture 9 CAP

Floating Point Numbers Lecture 9 CAP 3103 06-16-2014 Review of Numbers Computers are made to deal with numbers What can we represent in N bits? 2 N things, and no more! They could be Unsigned integers:

Representation of Numbers and Arithmetic in Signal Processors

Representation of Numbers and Arithmetic in Signal Processors 1. General facts Without having any information regarding the used consensus for representing binary numbers in a computer, no exact value

Data Representation and Introduction to Visualization

Data Representation and Introduction to Visualization Massimo Ricotti ricotti@astro.umd.edu University of Maryland Data Representation and Introduction to Visualization p.1/18 VISUALIZATION Visualization

EC121 Mathematical Techniques A Revision Notes

EC Mathematical Techniques A Revision Notes EC Mathematical Techniques A Revision Notes Mathematical Techniques A begins with two weeks of intensive revision of basic arithmetic and algebra, to the level

Computer arithmetics: integers, binary floating-point, and decimal floating-point

n!= 0 && -n == n z+1 == z Computer arithmetics: integers, binary floating-point, and decimal floating-point v+w-w!= v x+1 < x Peter Sestoft 2010-02-16 y!= y p == n && 1/p!= 1/n 1 Computer arithmetics Computer

CSC201, SECTION 002, Fall 2000: Homework Assignment #2

1 of 7 11/8/2003 7:34 PM CSC201, SECTION 002, Fall 2000: Homework Assignment #2 DUE DATE Monday, October 2, at the start of class. INSTRUCTIONS FOR PREPARATION Neat, in order, answers easy to find. Staple

Preliminary Decimal-Floating-Point Architecture

z/architecture Preliminary Decimal-Floating-Point Architecture November, 2006 SA23-2232-00 ii Preliminary Decimal-Floating-Point Architecture Note: Before using this information and the product it supports,

ECE 2020B Fundamentals of Digital Design Spring problems, 6 pages Exam Two 26 February 2014

Instructions: This is a closed book, closed note exam. Calculators are not permitted. If you have a question, raise your hand and I will come to you. Please work the exam in pencil and do not separate

Chapter 3 Arithmetic for Computers (Part 2)

Department of Electr rical Eng ineering, Chapter 3 Arithmetic for Computers (Part 2) 王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw ncku edu Depar rtment of Electr rical Eng ineering, Feng-Chia Unive

CMSC 313 Lecture 03 Multiple-byte data big-endian vs little-endian sign extension Multiplication and division Floating point formats Character Codes

Multiple-byte data CMSC 313 Lecture 03 big-endian vs little-endian sign extension Multiplication and division Floating point formats Character Codes UMBC, CMSC313, Richard Chang 4-5 Chapter

Computer Systems C S Cynthia Lee

Computer Systems C S 1 0 7 Cynthia Lee 2 Today s Topics LECTURE: Floating point! Real Numbers and Approximation MATH TIME! Some preliminary observations on approximation We know that some non-integer numbers