In this lesson you will learn: how to add and multiply positive binary integers how to work with signed binary numbers using two s complement how fixed and floating point numbers are used to represent fractions what overflow and underflow are how to round binary numbers what precision and normalisation are with reference to binary numbers. Adding unsigned binary numbers 0 0 1 1 0 0 1 0 + 1 0 1 1 0 1 0 1 1 1 1 0 0 1 1 1 1 1 0+0 will equal 0 so put 0 on the answer line. 0+1 or 1+0 will both equal 1 so put 1 in the answer line. 1+1 will equal 10 (one, zero) so you will put 0 in the answer line and carry the 1. 1+1+1 will equal 11 (one, one) so you will put 1 in the answer line and carry the 1. Multiplying unsigned binary numbers Two s complement binary to decimal Two s complement is a method used to represent signed integers in binary form. To convert the binary code 10011100 into decimal using two s complement: Write out the denary equivalents noting that with two s complement, the most significant bit becomes negative. Using an 8-bit code, this means that the MSB represents a value of 128. Write in the binary code Add up the values: 128 + 16 + 8 + 4 = 100 1
Two s complement decimal to binary To convert 102 into binary: first write out the binary equivalent of +102 as shown: Starting at the LSB, write out the number again until you come to the first 1. Reverse all the remaining bits, that is, 0 becomes 1 and 1 becomes 0. The number becomes 10011010 and is now in two s complement. Fixed point numbers In order to represent real decimal numbers, that is, numbers with decimal places or a fractional part, fixed point representation can be used. In the same way that decimal has a decimal point, binary has a binary point. The binary point is not actually stored in the 8-bit code its position is fixed by the programmer. It is shown here purely to aid understanding. Therefore, the number above is 4 + 2 + 1 / 4 + 1 / 8 giving a total of 6 3 / 8 or 6.375. Location of the binary point The binary point can be located anywhere in the number. With a fixed binary point as shown, the smallest number that could be represented other than zero is 0000.0001, which is 1 / 16 or 0.0625. The next number we could represent is 0000.0010 which is ( 1 / 8) 2 / 16 or 0.125. It is not possible to represent any number between 0.0625 and 0.125. With a fixed binary point as shown, the largest number we could represent is 1111.1111 which is 15 15 / 16 or 15.9375. Moving the binary point to the left means that we can have more accurate decimals but reduces the range of whole numbers available. Moving the binary point to the right increases the range of whole numbers but reduces the accuracy. It remains the case that with an 8-bit code, we can represent 256 different combinations regardless of where we put the binary point. 2
Negative fixed point binary The same techniques can also be used to represent negative numbers: This number would be: -8 + 4 + 2 + ½ + ¼ = -1¼ Floating point numbers The problem with all the 8-bit systems we have investigated so far is that they can only store a very limited range of numbers. The biggest positive number we have been able to store so far is only 255, and the smallest positive number is 0.0625. To get around this, more bits can be allocated: Allocate more bits to store the number. For example, a 16-bit unsigned code would allow you to store all the integers from 0 to 65 535; a 24-bit code would allow you to cope with 16 777 216 different combinations; and so on. If you wanted to store negative and positive numbers you would need to use the two s complement system outlined above. Using 16 bits would allow you to store between 32 768 and 32 767. Using floating point the binary point can be moved depending on the number that you are trying to represent. How floating point works A floating point number is made up of two parts the mantissa and the exponent. In decimal, we often have to calculate large numbers on our calculators. Most calculators have an eight or ten digit display and often the numbers we are calculating need more digits. When this happens, an exponent and mantissa are used. For example, the number 450 000 000 000, which is 45 and ten zeros, would be shown as 4.5 x 10 11. 4.5 is the mantissa and 10 11 is the exponent, meaning that the decimal place is moved 11 places to the right. In binary, the exponent and mantissa are used to allow the binary point to float as in the previous example. Remember that the mantissa and/or the exponent may be negative as the two s complement method is also used on each part. Consider the following 12-bit code: 000011000011. The code can be broken down as follows: the first eight bits are the mantissa which can be broken down further as: o the MSB is 0 which means that the number is positive; o the next seven bits are the rest of the mantissa: 0001100. o The remaining four bits are the exponent: 0011 3
Example 1. calculate the exponent: Mantissa Exponent 00001100 0011 Work out the exponent in the usual way, remembering that two s complement is being used. The exponent in this case is +3. This means that the binary point will float three places to the right. 2. calculate the mantissa The binary point is always placed after the most significant bit as follows: The point now floats three places to the right. The values for the conversion have changed because the binary point has now moved. Therefore, 00001100 0011 = 0.75. Advantage of floating point The advantages of using floating point are: a much wider range of numbers can be produced with the same number of bits as the fixed point system consequently, floating point lends itself to applications where a wide range of values may need to be represented. Advantages of fixed point The values are handled in the same way as decimal values, so any hardware configured to work with integers can be used on reals. This also makes the processing of fixed point numbers faster than floating point values as there is no processing required to move the binary point. The absolute error will always be the same, whereas with floating point numbers, the absolute error could vary. This means that precision is retained, albeit within a more limited range than floating point representation. 4
They are suited to applications where speed is more important than precision, e.g. some aspects of gaming or digital signal processing. They are suited to applications where an absolute level of precision is required, e.g. currency, where the number of places after the binary point does not need to change. Underflow and overflow It is possible when using signed binary that you will generate a number that is either too large or too small to be represented by the number of bits that are available to store it. When the number is too large, we get an overflow and when it is too small we get an underflow. An example of overflow is if 00000001 (1 in decimal) was added to 01111111 (127 in decimal), it would generate the answer 1000000 (128 in decimal). However, using two s complement, we would actually get the answer -128. Normalisation and precision Normalisation is a technique used to ensure that when numbers are represented they are as precise as possible in relation to how many bits are being used. The exponent is used to ensure that the floating point is placed to optimise the precision of the number. For example: 234000 can be represented as: 23 400 10 1 2.34 10 5 0.00000234 10 11 The second option is the best way to represent the number as it uses the least number of digits, yet provides a precise result. Normalised binary numbers With binary codes, normalisation is equally important. In order to be normalised, the first bit of the mantissa, after the binary point, should always be a 1 for positive numbers, or a 0 for negative numbers and the bit before the binary point should be the opposite. In other words, a normalised positive floating point number must start 0.1 and a normalised negative floating point number must start 1.0. Rounding errors When working with decimal numbers, we will get rounding errors. For example, 1 /3 in decimal is 0.3 recurring. We are comfortable with using 0.33 or perhaps 0.333 to represent 1 /3. Obviously, there is a degree of error in this calculation. Whether it is acceptable or not depends on what our program is doing. A similar phenomenon occurs with binary representation. For example, if you try to make 0.1 into binary you will find that you get a recurring number, so it is not possible to exactly represent it. 5
Absolute and relative errors The absolute error is the actual mathematical difference between the answer and the approximation of it that you can store. For example, if a calculation required 8 decimal places, but we only allocated 8 digits, we would have to either round or truncate the number for example, the number 1.65746552 would become 1.6574655. In this case to work out the absolute error we would subtract the two values and that would give us an absolute error of 0.00000002. Note that the absolute error is always a positive number. Relative error looks at the value that was being stored and then decides on a relative margin of error. In this way you are comparing the actual result to the expected result. Relative error = Absolute error Number intended to be stored 6