Number Systems. Binary Numbers. Appendix. Decimal notation represents numbers as powers of 10, for example

Appendix F Number Systems Binary Numbers Decimal notation represents numbers as powers of 10, for example 1729 1 103 7 102 2 101 9 100 decimal = + + + There is no particular reason for the choice of 10, except that several historical number systems were derived from people s counting with their fingers. Other number systems, using a base of 12, 20, or 60, have been used by various cultures throughout human history. However, computers use a number system with base 2 because it is far easier to build electronic components that work with two values, which can be represented by a current being either off or on, than it would be to represent 10 different values of electrical signals. A number written in base 2 is also called a binary number. For example, 1101 1 2 1 2 0 2 1 2 8 4 1 13 3 2 1 0 binary = + + + = + + =

2 APPENDIX F Number Systems For digits after the decimal point, use negative powers of 2. 1. 101 1 20 1 2 1 0 2 2 1 2 3 binary = + + + 1 1 = 1 + + 2 8 = 1 + 0. 5 + 0. 125 = 1. 625 In general, to convert a binary number into its decimal equivalent, simply evaluate the powers of 2 corresponding to digits with value 1, and add them up. Table 1 shows the first powers of 2. Table 1 Powers of Two Power Decimal Value 2 0 1 2 1 2 2 2 4 2 3 8 2 4 16 2 5 32 2 6 64 2 7 128 2 8 256 2 9 512 2 10 1,024 2 11 2,048 2 12 4,096 2 13 8,192 2 14 16,384 2 15 32,768 2 16 65,536

APPENDIX F Number Systems 3 To convert a decimal integer into its binary equivalent, keep dividing the integer by 2, keeping track of the remainders. Stop when the number is 0. Then write the remainders as a binary number, starting with the last one. For example, 100 2 = 50 remainder 0 50 2 = 25 remainder 0 25 2 = 12 remainder 1 12 2 = 6 remainder 0 6 2 = 3 remainder 0 3 2 = 1remainder 1 1 2 = 0 remainder 1 Therefore, 100decimal = 1100100binary. Conversely, to convert a fractional number less than 1 to its binary format, keep multiplying by 2. If the result is greater than 1, subtract 1. Stop when the number is 0. Then use the digits before the decimal points as the binary digits of the fractional part, starting with the first one. For example, 035. i 2 = 0. 7 07. i 2 = 1. 4 04. i 2 = 0. 8 08. i 2 = 1. 6 06. i 2 = 1. 2 02. i 2 = 0. 4 Here the pattern repeats. That is, the binary representation of 0.35 is 0.01 0110 0110 0110... To convert any floating-point number into binary, convert the whole part and the fractional part separately. Long, Short, Signed, and Unsigned Integers There are two important properties that characterize integer values in computers. These are the number of bits used in the representation, and whether the integers are considered to be signed or unsigned. Most computers you are likely to encounter use a 32-bit integer. However, the C++ language does not require this, and there have been machines that used 16-, 20-, 36-, or even 64-bit integers. There are times when it is useful to have integers of different sizes. The C++ language provides two modifiers that are used to declare such integers. A short int (or simply a short) is an integer that, on most

4 APPENDIX F Number Systems implementations, has fewer bits than an int. (The phrase on most implementations is necessary because the language definition only requires that a short integer have no more bits than a standard integer.) On most platforms that use a 32-bit integer, a short is 16 bits. At the other extreme are long integers. As you might expect, a long int (or simply a long) contains no fewer bits than a standard integer. At the present time most personal computers still use a 32-bit long, but processors that provide 64-bit longs have started to appear and will likely be more common in the future. A character (or char) is sometimes used as a very short (8-bit) integer. The C++ programmer therefore has the following hierarchy of integer sizes: Type char short int long Typical Size 8-bit 16-bit 32-bit 32- or 64-bit The sizeof operator can be used to tell how many bits your compiler assigns to each type. This operator takes a type as argument and returns the number of bytes each type requires. Multiplying the number of bytes by 8 will tell you the number of bits: cout << "Number of bytes for char " << sizeof(char) << " number of bits " << 8 * sizeof(char) << "\n"; cout << "Number of bytes for short " << sizeof(short) << " number of bits " << 8 * sizeof(short) << "\n"; cout << "Number of bytes for int " << sizeof(int) << " number of bits " << 8 * sizeof(int) << "\n"; cout << "Number of bytes for long " << sizeof(long) << " number of bits " << 8 * sizeof(long) << "\n"; If the only numbers you needed were positive, then the preceding discussion would be everything you needed to know. However, in most applications it is more useful to allow both positive and negative values, and so a more complicated encoding is necessary. This characteristic of an integer is declared using the modifiers signed and unsigned. An unsigned integer holds only positive values. An unsigned short int that is represented using 16 bits can maintain the values between 0 and 65,535 (that is, between zero and 2 16 1). A 32-bit unsigned int can represent values between 0 and 4,294,967,295. If no modifier is provided, an integer is assumed to be signed. Allowing both positive and negative values requires changing the representation of an integer value. The details of this representation are described in the next section. However, an important feature is that allowing both positive and negative numbers requires setting aside one bit (the so-called sign bit) to indicate whether the number is positive or negative. This reduces the largest value that can be

APPENDIX F Number Systems 5 represented. The following table shows the range of values that can be represented using signed and unsigned integers of 8, 16, 32, and 64 bits. Integer Type Range of Values 8-bit signed 128 to 127 8-bit unsigned 0 to 255 16-bit signed 32,768 to 32,767 16-bit unsigned 0 to 65,535 32-bit signed 2,147,483,648 to 2,147,483,647 32-bit unsigned 0 to 4,294,967,295 64-bit signed 9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 64-bit unsigned 0 to 18,446,744,073,709,551,615 Two s Complement Integers To represent negative integers, there are two common representations, called signed magnitude and two s complement. Signed magnitude notation is simple: use the leftmost bit for the sign (0 = positive, 1 = negative). For example, when using 8-bit numbers, 13 = 10001101 signed magnitude However, building circuitry for adding numbers gets a bit more complicated when one has to take a sign bit into account. The two s complement representation solves this problem. To form the two s complement of a number, Flip all bits. Then add 1. For example, to compute 13 as an 8-bit value, first flip all bits of 00001101 to get 11110010. Then add 1: 13 = 11110011 two s complement Now no special circuitry is required for adding two numbers. Just follow the normal rule for addition, with a carry to the next position if the sum of the digits and the prior carry is 2 or 3.

6 APPENDIX F Number Systems For example, 1 1111 111 +13 0000 1101-13 1111 0011 1 0000 0000 But only the last 8 bits count, so +13 and 13 add up to 0, as they should. In particular, 1 has two s complement representation 1111... 1111, with all bits set. The leftmost bit of a two s complement number is 0 if the number is positive or zero, 1 if it is negative. Two s complement notation with a given number of bits can represent one more negative number than positive numbers. For example, the 8-bit two s complement numbers range from 128 to +127. This phenomenon is an occasional cause for a programming error. For example, consider the following code: short b =...; if (b < 0) b = -b; This code does not guarantee that b is nonnegative afterwards. If short values are 16 bits and b happens to be 32,768, then computing its negative again yields 32,768. (Try it out take 100... 00 (15 zeros), flip all bits, and add 1.) IEEE Floating-Point Numbers The Institute for Electrical and Electronics Engineering (IEEE) defines standards for floating-point representations in the IEEE-754 standard. Figure 1 shows how single-precision (float) and double-precision (double) values are decomposed into A sign bit An exponent A mantissa 1 bit 8 bit 23 bit sign biased exponent e + 127 mantissa (without leading 1) Single Precision 1 bit 11 bit 52 bit sign biased exponent e + 1023 mantissa (without leading 1) Double Precision Figure 1 IEEE Floating-Point Representation

APPENDIX F Number Systems 7 Floating-point numbers use scientific notation, in which a number is represented as b b b b e 0. 1 2 3 2 In this representation, e is the exponent, and the digits b0. b1b2b3 form the mantissa. The normalized representation is the one where b 0 0. For example, 100 = 1100100 = 1. 100100 2 6 decimal binary binary Because in the binary number system the first bit of a normalized representation must be 1, it is not actually stored in the mantissa. Therefore, you always need to add it on to represent the actual value. For example, the mantissa 1.100100 is stored as 100100. The exponent part of the IEEE representation uses neither signed magnitude nor two s complement representation. Instead, a bias is added to the actual exponent. The bias is 127 for single-precision numbers and 1023 for double-precision numbers. For example, the exponent e = 6 would be stored as 133 in a single-precision number. Thus, 100 decimal = 0 10000101 10010000000000000000000 single-precision IEEE In addition, there are several special values. Among them are: Zero: biased exponent = 0, mantissa = 0. Infinity: biased exponent = 11...1, mantissa = ±0. NaN (not a number): biased exponent = 11...1, mantissa ±0.

8 APPENDIX F Number Systems Hexadecimal Numbers Because binary numbers can be hard to read for humans, programmers often use the hexadecimal number system, with base 16. The digits are denoted as 0, 1,, 9, A, B, C, D, E, F. (See Table 2.) Four binary digits correspond to one hexadecimal digit. That makes it easy to convert between binary and hexadecimal values. For example, 11 1011 0001 binary = 3B1 hexadecimal In C++, hexadecimal integers are denoted with a 0x prefix, such as 0x3B1. Table 2 Hexadecimal Digits Hexadecimal Decimal Binary 0 0 0000 1 1 0001 2 2 0010 3 3 0011 4 4 0100 5 5 0101 6 6 0110 7 7 0111 8 8 1000 9 9 1001 A 10 1010 B 11 1011 C 12 1100 D 13 1101 E 14 1110 F 15 1111