Imelda C. Go, South Carolina Department of Education, Columbia, SC

Similar documents
Mastering the Basics: Preventing Problems by Understanding How SAS Works. Imelda C. Go, South Carolina Department of Education, Columbia, SC

Handling Numeric Representation SAS Errors Caused by Simple Floating-Point Arithmetic Computation Fuad J. Foty, U.S. Census Bureau, Washington, DC

Numeric Precision 101

1. NUMBER SYSTEMS USED IN COMPUTING: THE BINARY NUMBER SYSTEM

Bits, Words, and Integers

Lesson 5: Introduction to the Java Basics: Java Arithmetic THEORY. Arithmetic Operators

CS321. Introduction to Numerical Methods

Example 2: Simplify each of the following. Round your answer to the nearest hundredth. a

Chapter 2 Float Point Arithmetic. Real Numbers in Decimal Notation. Real Numbers in Decimal Notation

Will introduce various operators supported by C language Identify supported operations Present some of terms characterizing operators

Floating Point Puzzles. Lecture 3B Floating Point. IEEE Floating Point. Fractional Binary Numbers. Topics. IEEE Standard 754

Excerpt from "Art of Problem Solving Volume 1: the Basics" 2014 AoPS Inc.

Learning the Language - V

Floating Point January 24, 2008

Signed umbers. Sign/Magnitude otation

Formats. Formats Under UNIX. HEXw. format. $HEXw. format. Details CHAPTER 11

Roundoff Errors and Computer Arithmetic

Operators in C. Staff Incharge: S.Sasirekha

Numeral Systems. -Numeral System -Positional systems -Decimal -Binary -Octal. Subjects:

Systems I. Floating Point. Topics IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties

Chapter 1. Numeric Artifacts. 1.1 Introduction

CHAPTER 2 Number Systems

Name Period Date. REAL NUMBER SYSTEM Student Pages for Packet 3: Operations with Real Numbers

Unit 3. Operators. School of Science and Technology INTRODUCTION

Math 340 Fall 2014, Victor Matveev. Binary system, round-off errors, loss of significance, and double precision accuracy.

Floating Point Considerations

Floating point. Today! IEEE Floating Point Standard! Rounding! Floating Point Operations! Mathematical properties. Next time. !

Outline. Data and Operations. Data Types. Integral Types

Floating Point Puzzles. Lecture 3B Floating Point. IEEE Floating Point. Fractional Binary Numbers. Topics. IEEE Standard 754

3.1 DATA REPRESENTATION (PART C)

Topic 2: Decimals. Topic 1 Integers. Topic 2 Decimals. Topic 3 Fractions. Topic 4 Ratios. Topic 5 Percentages. Topic 6 Algebra

Floating point. Today. IEEE Floating Point Standard Rounding Floating Point Operations Mathematical properties Next time.

PROC MEANS for Disaggregating Statistics in SAS : One Input Data Set and One Output Data Set with Everything You Need

1. Let n be a positive number. a. When we divide a decimal number, n, by 10, how are the numeral and the quotient related?

System Programming CISC 360. Floating Point September 16, 2008

Using Arithmetic of Real Numbers to Explore Limits and Continuity

Divide: Paper & Pencil

Python Numbers. Learning Outcomes 9/19/2012. CMSC 201 Fall 2012 Instructor: John Park Lecture Section 01 Discussion Sections 02-08, 16, 17

9 abcd = dcba b + 90c = c + 10b b = 10c.

Finite arithmetic and error analysis

Section 1.4 Mathematics on the Computer: Floating Point Arithmetic

What Every Programmer Should Know About Floating-Point Arithmetic

Chapter 03: Computer Arithmetic. Lesson 09: Arithmetic using floating point numbers

Advanced features of the Calculate signal tool

Number Systems and Binary Arithmetic. Quantitative Analysis II Professor Bob Orr

Divisibility Rules and Their Explanations

Floating Point Puzzles The course that gives CMU its Zip! Floating Point Jan 22, IEEE Floating Point. Fractional Binary Numbers.

cast int( x float( x str( x hex( int string int oct( int string int bin( int string int chr( int int ord( ch

Topic C. Communicating the Precision of Measured Numbers

Chapter 2: Number Systems

CS429: Computer Organization and Architecture

Giving credit where credit is due

unused unused unused unused unused unused

Giving credit where credit is due

Data Representation. Variable Precision and Storage Information. Numeric Variables in the Alpha Environment CHAPTER 9

Beginner Beware: Hidden Hazards in SAS Coding

EC121 Mathematical Techniques A Revision Notes

Modular Arithmetic. is just the set of remainders we can get when we divide integers by n

9 abcd = dcba b + 90c = c + 10b b = 10c.

Computer Arithmetic Floating Point

The float type and more on variables FEB 6 TH 2012

Long (or LONGMATH ) floating-point (or integer) variables (length up to 1 million, limited by machine memory, range: approx. ±10 1,000,000.


Computer Arithmetic. In this article we look at the way in which numbers are represented in binary form and manipulated in a computer.

1.2 Round-off Errors and Computer Arithmetic

Floating Point Representation in Computers

50 MATHCOUNTS LECTURES (6) OPERATIONS WITH DECIMALS

1. Variables 2. Arithmetic 3. Input and output 4. Problem solving: first do it by hand 5. Strings 6. Chapter summary

Level ISA3: Information Representation

Continued Fractions: A Step-by-Step Tutorial, with User RPL Implementations

Real Numbers finite subset real numbers floating point numbers Scientific Notation fixed point numbers

Introduction to Scientific Computing Lecture 1

Internal Data Representation

SCHOOL OF ENGINEERING & BUILT ENVIRONMENT. Mathematics. Numbers & Number Systems

Excerpt from: Stephen H. Unger, The Essence of Logic Circuits, Second Ed., Wiley, 1997

Section 1.2 Fractions

Binary, Hexadecimal and Octal number system

Real Numbers finite subset real numbers floating point numbers Scientific Notation fixed point numbers

A Different Content and Scope for School Arithmetic

Lesson 1: Arithmetic Review

PRIMITIVE VARIABLES. CS302 Introduction to Programming University of Wisconsin Madison Lecture 3. By Matthew Bernstein

Module 1: Information Representation I -- Number Systems

ROUNDING ERRORS LAB 1. OBJECTIVE 2. INTRODUCTION

add and subtract whole numbers with more than 4 digits, including using formal written methods (columnar addition and subtraction)

Chapter 2. Positional number systems. 2.1 Signed number representations Signed magnitude

Lesson 4.02: Operations with Radicals

9/10/10. Arithmetic Operators. Today. Assigning floats to ints. Arithmetic Operators & Expressions. What do you think is the output?

Hexadecimal Numbers. Journal: If you were to extend our numbering system to more digits, what digits would you use? Why those?

Arithmetic. 2.2.l Basic Arithmetic Operations. 2.2 Arithmetic 37

2.Simplification & Approximation

Digital Arithmetic. Digital Arithmetic: Operations and Circuits Dr. Farahmand

Numbers and Computers. Debdeep Mukhopadhyay Assistant Professor Dept of Computer Sc and Engg IIT Madras

(Refer Slide Time: 02:59)

Programming for Engineers Introduction to C

Topic 3: Fractions. Topic 1 Integers. Topic 2 Decimals. Topic 3 Fractions. Topic 4 Ratios. Topic 5 Percentages. Topic 6 Algebra

Number Systems and Computer Arithmetic

Objectives. Connecting with Computer Science 2

CPE 323 REVIEW DATA TYPES AND NUMBER REPRESENTATIONS IN MODERN COMPUTERS

Data Representation Floating Point

Maciej Sobieraj. Lecture 1

Transcription:

PO 082 Rounding in SAS : Preventing Numeric Representation Problems Imelda C. Go, South Carolina Department of Education, Columbia, SC ABSTRACT As SAS programmers, we come from a variety of backgrounds. We range from having little or no formal computer science background to having academic degrees in computer science. Numeric precision and representation are standard topics in computer science education. Programmers eventually encounter numeric precision and representation issues. Their associated problems are particularly harmful when the programmer is not aware that they are present and hence, did not take programming steps to handle such problems. For example, we might notice that our numeric comparison results are not what we expected. Consider the SAS statement if 0.3=3*0.1 then equal='y'; else equal='n'; If you think the result is Y, then this paper is a must-read. Fortunately, there are steps you can take to prevent these types of problem. Note: Due to differences in hardware limitations and operating systems, the PC SAS examples shown below may vary from the results on other computer systems. There is not a commonly used method of performing calculations across computer systems. NOT ALL NUMBERS CAN BE REPRESENTED EXACTLY ON THE COMPUTER Numeric precision (i.e., the accuracy with which a number can be represented) and representation in computers are the roots of the problem. SAS uses floating-point (i.e., real binary) representation. The original decimal number and the binaryrepresented number may be very close, but very close is not the same as equal. There happens to be no exact binary representation for the decimal values of 0.1 and 0.3, which accounts for the difference in example #1 below. The advantage of floating-point representation is speed and its disadvantage is representation error. Repeating decimals and irrational numbers are other obvious problems for exact storage on a computer. For example, 1/3 is equal to a decimal point followed by an infinite number of 3 s. Computers cannot store an infinite number of digits. We need to make a distinction between the expected mathematical result (our decimal values) and what the computer can store (binary values) and program accordingly. Readers may refer to two SAS technical support references (TS-230 and TS- 654) listed at the end of this paper for in-depth explanations and examples regarding floating-point representation. EXAMPLE #1 We know, as a mathematical fact, that 3 multiplied by 0.1 is 0.3. Therefore, when we examine the following code below, it would seem reasonable to expect that the equal variable will have a value of Y because both variables resolve to 0.3 (at least mathematically). If you are thinking the only possible answer is Y, then you are in for surprise! Let s look at the PROC PRINT output for the data set above. If we use the following statement with PROC PRINT, format value1 value2 32.31; 1

we get the following output. The two values are both 0.3, but that is only as far as the PROC PRINT output goes. The two values are stored in the computer differently. In a later page, we note that SAS formats round and that is why we are not able to see the difference in the values. To see how the values differ, let us use the HEX16. format with PROC PRINT. format value1 value2 hex16.; We get the following output that shows the difference between the two values. EXAMPLE #2 Here is still another example. The difference of both pairs of numbers is mathematically 3.8, but the comparisons fail. Without specifying a format, we get the following results. If we use the following statement with PROC PRINT, we get the following output. format difference 32.31; 2

EXAMPLE #3 Representation error can become a serious problem when one is unaware it could even happen and takes no precautions against it. Unaccounted for, the size of the errors or discrepancies could accumulate over multiple operations. Let s take the simple example of adding 0.1 ten trillion times. We know the result should be one trillion. After adding all those numbers, SAS produces the following. Over so many, many calculations, the difference accumulated to 163.124. How serious is that? It all depends on your data. This might still be tolerable for some and totally unacceptable for others. Something else to think about is what happens to other results when the tainted sum is used in other calculations. COPING WITH THE PROBLEM We are responsible for our data, programs, and results. The first step in solving the problem is identifying the problem and being aware of the conditions under which the problem might create undesirable results. As far as the computer science field is concerned, this is a known problem. Most of the published algorithms for numerical analysis are designed to account for and minimize the effect of representation error. (TS-230) Unfortunately there is no one method that best handles the problems caused by numeric representation error. What is a perfectly good solution for one application may drastically affect the performance and results of another application. (TS-230) Hence, this paper focuses on the simplest examples of this problem. Coping Strategy #1: Keep It Whole The safest way is to just deal with integers or whole numbers. If on a computer, the results of operations on integers are always integers, then there is no problem because an integer can be stored exactly in computers as long as the largest integer value the computer can represent has not been exceeded. Whether you can stay within the realm of integers depends on what data are involved and what needs to be done to the data. Unless you re just adding, subtracting, and multiplying integers with integers, you could encounter a noninteger when it s time to divide an integer with another integer. Consider the following example, which could be monetary amounts, such as dollars and cents. The input values were multiplied by the scale factor of 100 (to transfer the digits after the two decimal places to the integer side of the number). The INT function, which returns the integer value of the argument, is then applied to remove the representation error that might have been introduced by the decimal or fractional portion of the input. 3

You can proceed to apply integer arithmetic to the integer values. When you reach the last integer arithmetic result, you can divide it by 100 to regain the decimal portion. You can also apply a similar strategy to percentages. Percentages, such as 18%, can be multiplied by 100 and stored as 18. Coping Strategy #2: Dare to Compare with Rounded Numbers In examples #1 #3 above, representation error manifested itself in the comparison of values. TS-654 recommends that you keep the following in mind when working with nonintegers or real numbers in general, Know your data. Decide on the level of significance you need. Remember that all numeric values are stored in floating-point representation. Use comparison functions, such as ROUND. You can apply the ROUND function at strategic points in the calculation process (e.g., at the end of a series of calculations, after each calculation). What you do depends on the nature of the data, what you have to do with the data, and when representation error might become an issue. Before making an equality comparison, you can round one or both of the operands. An alternative to rounding is specifying to what degree two values are close enough so that they can be considered good as equal as far as your SAS programming is concerned. This process is called fuzzing the comparison. Refer to TS-230 for examples. The ROUND function has the following syntax: ROUND (argument <,rounding-unit>) It rounds the first argument to the nearest multiple of the second argument. When the rounding unit is unspecified, it rounds to the nearest integer. The SAS 9 Language Reference Dictionary reassures us that we can expect to produce decimal arithmetic results if the result has no more than nine significant digits and either of the following conditions are true: The rounding unit is an integer or is a power of 10 greater than or equal to 1E-15. The expected decimal arithmetic result has no more than four decimal places. Refer to the SAS 9 Language Reference Dictionary for more details regarding the ROUND function. Should the ROUND function fail to meet your needs, you may specify your own fuzz factor to use with the ROUND function. TS-230 provides examples of how to do this. Let us modify EXAMPLE #1 to include the ROUND function for both values. This time we can expect the correct mathematical results because the ROUND function returns the value that is based on decimal arithmetic by rounding the values to the first decimal place. 4

Let us modify EXAMPLE #2 to include the ROUND function at the point of comparison. Let us modify EXAMPLE #3 to include the ROUND function each time addition occurs. A Word on SAS Formats Let us suppose you have numbers that have been stored with numeric representation error and all you want to do is print them out with their mathematically correct values. According to TS-230, numeric formats round. The HEX16. format is an exception (i.e., it displays the exact value of a variable in exact hexadecimal representation of the 8-byte floating-point number). Another exception is the user-defined PICTURE format, which truncates by default, in PROC FORMAT. Formats affect how numbers are displayed and do not affect how they are stored or represented internally. 5

Functions You May Find Useful The ROUND and INT functions are not the only defenses against numeric representation problems. Which function you use depends on the nature of your data and computations. Here is a list of functions described in the SAS 9 Language Reference: Dictionary that you may find helpful or insightful as you develop your strategies against these types of issues. The functions that have zero fuzzing (i.e., CEILZ, FLOORZ, INTZ, MODZ, ROUNDZ) do not try to make their results match the values to be expected with decimal arithmetic. Hence, they may produce unexpected results. In other situations, you may have to create your own function to suit your particular needs. FUNCTION SYNTAX DESCRIPTION CEIL CEIL (argument) Returns the smallest integer that is greater than or equal to the argument, fuzzed to avoid unexpected floating-point results (If the argument is within 1E-12 of an integer, the function returns that integer.) CEILZ CEILZ (argument) Returns the smallest integer that is greater than or equal to the argument, using zero fuzzing FLOOR FLOOR (argument) Returns the largest integer that is less than or equal to the argument, fuzzed to avoid unexpected floating-point results. (If the argument is within 1E-12 of an integer, the function returns that integer.) FLOORZ FLOORZ (argument) Returns the largest integer that is less than or equal to the argument, using zero fuzzing FUZZ FUZZ (argument) Returns the nearest integer if the argument is within 1E-12 INT INT (argument) Returns the integer value, fuzzed to avoid unexpected floating-point results (If the argument is within 1E-12 of an integer, the function returns that integer.) INTZ INTZ (argument) Returns the integer portion of the argument, using zero fuzzing MOD MOD (argument-1, argument-2) Returns the remainder from the division of the first argument by the second argument, fuzzed to avoid unexpected floating-point results MODZ MOD (argument-1, argument-2) Returns the remainder from the division of the first argument by the second argument, using zero fuzzing ROUND ROUND (argument <,rounding-unit>) Rounds the first argument to the nearest multiple of the second argument, or to the nearest integer when the second argument is omitted ROUNDE ROUNDE (argument <,rounding-unit>) Rounds the first argument to the nearest multiple of the second argument, and returns an even multiple when the first argument is halfway between the two nearest multiples ROUNDZ ROUNDZ (argument <,rounding-unit>) Rounds the first argument to the nearest multiple of the second argument, with zero fuzzing REFERENCES SAS Institute Inc. 2002. SAS 9 Language Reference: Concepts. Cary, NC: SAS Institute Inc. SAS Institute Inc. 2002. SAS 9 Language Reference: Dictionary, Volumes 1 and 2. Cary, NC: SAS Institute Inc. TS-230: Dealing with numeric representation error in SAS applications. Retrieved July 1, 2008, from the SAS Web site: http://support.sas.com/techsup/technote/ts230.html TS-654: Numeric precision 101. Retrieved July 1, 2008, from the SAS Web site: http://support.sas.com/techsup/technote/ts654.pdf TRADEMARK NOTICE SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. 6