Physics 306 Computing Lab 5: A Little Bit of This, A Little Bit of That 1. Introduction You have seen situations in which the way numbers are stored in a computer affects a program. For example, in the first lab, you calculated n! for larger and larger n. When the numbers got very large, the computer started producing incorrect results numbers which were too small, and even negative numbers. These resulted from the way that the computer stores and manipulates numbers. Today you will explore how numbers are stored by the computer when running a C program. The technical standard for the C language does not specify how numbers are stored. This is traditionally a matter left to the designers of the hardware and operating system of an individual computer. In practice, most machines, and most computer languages, use formats and storage methods very similar to those you will see today. 2. Binary numbers The digital hardware inside a computer uses binary logic, zeros and ones. All numbers are stored as binary numbers and all arithmetic is be done in binary. Recall that, in base ten, a number like 8325 means (reading right to left), 5 ones plus 2 tens plus 3 hundreds plus 8 thousands. The columns (ones, tens, hundreds, thousands) are powers of ten. Similarly, in binary, the numerical columns are powers of two (ones, twos, fours, eights, sixteens, etc.), so that 10011 would be, reading right to left, 1 one, 1 two, 0 fours, 0 eights, and 1 sixteen, for a total of 19. Thus 10011 in binary is 19 in decimal. The individual digits in a binary number are called bits. If you get confused about binary numbers, ask for help. Enter and compile the following program. #include "stdio.h" int main() { int a; int i; printf("enter a:\n"); scanf("%d",&a); for (i=31; i>=0; i--) printf("%d", ((a>>i)&1) ); printf ("\n"); 1
There are two things in this program which may be new to you. First, the loop runs backwards. It starts with i at 31, decreases it by 1 in each iteration of the loop, (that s what i-- does), and runs as long as i 0. The second new thing is the expression ((a>>i)&1) This is an expression which manipulates the individual bits of numbers. We will discuss it later. For now, just run the program. Enter a number and the program will print it out as a binary number. Try a few numbers and convince yourself that it works. Notice that it prints out 32 bits. That is the number of binary digits in which an int variable is stored. In fact, only the rightmost 31 bits are used to store the size of the number. The leftmost bit, called the sign bit, is 0 if the number is positive (or zero) and 1 if the number is negative. The story is a little more complicated than that. Try entering a negative number into the program. What happens? Hmmm, strange. The binary representation of 1 is 11111111111111111111111111111111. The binary representation of 2 is 11111111111111111111111111111110. The binary representation of 3 is 11111111111111111111111111111101, and so on. Arithmetic within the negative numbers works as you would expect. For example, 3 + 1 = 2: -3 11111111111111111111111111111101 +1 +00000000000000000000000000000001 -------------------------------- -2 11111111111111111111111111111110 But why does 1 expressed in binary have one in every column? There is a good reason for this. Think about what properties the computer representation of 1 should have. One important property of the number 1 is that when you add +1 to it, you get zero. With the choice 11111111111111111111111111111111 of to represent 1, adding the binary patterns for 1 and +1 gives the following: -1 11111111111111111111111111111111 +1 +00000000000000000000000000000001 -------------------------------- 100000000000000000000000000000000 (Do you see how that arithmetic works? There is a lot of carrying involved.) Notice that the result is thirty-three digits long. However, the computer only stores 32 bits. The computer does not keep track of the thirty-third digit (the new, leftmost one). This digit simply disappears. The remaining 32 bits are all zeros, which is exactly what we want. So, to the computer, the arithmetic looks like this: -1 11111111111111111111111111111111 +1 +00000000000000000000000000000001 -------------------------------- 0 00000000000000000000000000000000 In other words, 1 + 1 = 0, as expected. 2
This method for storing negative integers, which is used in all modern computers, is called two s complement representation. Here s a recipe for calculating a two s complement representation, using 4307 as an example. (i) Write out the binary pattern for +4307. (ii) Change all the 1s to 0s and all the 0s to 1s (this is called the one s complement of the number). (iii) Add 1 (to get the two s complement ). 4307 00000000000000000001000011010011 one s complement 11111111111111111110111100101100 plus one +00000000000000000000000000000001 -------------------------------- -4307 11111111111111111110111100101101 3. How much memory? The usual units for talking about computer storage are not bits but, rather, bytes. One byte is eight bits. At some point in the past, eight bits was the typical amount of information a computer processor could handle at once. These days, thirty-two bits (four bytes) and sixty-four bits (eight bytes) are common. C can tell you how many bytes it takes to store a variable. #include "stdio.h"; int main() { int a, n; n = sizeof(a); printf ("An integer is stored as %d bytes\n",n); This prints the size of a in bytes. The C language allows integers which are longer or shorter than this. You might use a longer integer if you need more digits of precision. You might use a shorter integer if storage space is at a premium. The four types of integers allowed in C as implemented by gcc on our imacs are short, int, long, and long long : short a; int b; long c; long long d; Write a quick program which prints out the length of each of these types of variable. You will find that two of them are identical to each other. The only official rules in the C language are that short must be shorter than or equal to int, that int must be shorter than or equal to long, and so on. What about arrays? Define an array of, say, 100 integers: 3
int n[100]; What is the size of this array (use your program to find out)? Is it what you expect? What if you make an array of long long variables? 4. Bitwise operations Let s get back to the expression ((a>>i)&1), which was in your first program. This is an example of manipulating the bits of a number. First let s tackle a>>i. This means to take the number a and shift each bit to the right by i places. For example, a>>1 would shift each bit to the right by one place. Thus whatever was in the eights column would end up in the fours column, whatever was in the fours column would end up in the twos column, etc. a 01110010101110110111111111110101 a>>1 00111001010111011011111111111010 This is exactly what you would get if you divided a by 2. The other operator in that expression, & is called bitwise and. A typical use is: c = a & b. This takes the two numbers a and b compares them bit by bit. If a given bit is 1 in both a and b then it is set to 1 in the output number (hence the name and ). If it is 0 in either of the input numbers, then it is set to 0 in the output number. a 11110010101110110111111111110101 b 00100101000100111110011111010001 -------------------------------- c=a&b 00100000000100110110011111010001 Write a program to explicitly demonstrate the bitwise and operator. Your program should have the user enter two numbers and calculate the bitwise-and combination of them. Write all three out as binary numbers, as in the above example. The most efficient way to write such a program is to put the binary-printing loop into a separate function. Let s talk about functions for a minute. You have written functions to calculate, say, 5(sin x exp x). Such a function has an input parameter, x, and returns a value, whatever number it calculates. Whenever you call f(a) in the main program, where a is some number, the computer jumps to the function, sets x to the value of a, runs whatever calculation is in the function, and then returns to the main program. Functions can do more than just calculate numbers. They can include any standard C commands (variable declarations, loops, writing output, etc.). They can return a value, but it is not always required to. (Some purists will point out that, technically, a function always returns a value. But if we don t set a return value, and we don t look for a return value, does it matter?) Here s a simple example of a function. Whatever value it is given, it prints ten times. That s a silly thing to do. This is just meant to show how functions work. It is not meant to be practical. 4
#include "stdio.h" int f(int x) { int i; for (i=0; i<10; i++) printf("%d\n",x); int main() { int a,b,c; printf("enter a b c separated by spaces: \n"); scanf("%d %d %d",&a,&b,&c); printf("i will now print each of these ten times:\n"); f(a); f(b); f(c); Calling a function is simple: just use a command like f(a);. understand how this program works before continuing on. Make sure you Getting back to the matter at hand, write a function which, when given an integer, writes it out in binary. This is tantamount to moving the loop in the first program in this lab into a separate function, along with the appropriate declarations. Then write a main program which has the user enter two numbers, a, and b, calculates the bitwise and, c = a&b, and writes out a, b, and c in binary form. (Confused? Get help!) Once that program is working, copy it and modify it to demonstrate the right-shift operator, i.e., to calculate c = a >> b. Read in some number, shift it by some number of digits, and print out the original number and the shifted number in binary. When a number is shifted to the right, the computer has to decide what to put in the leftmost digit (which is otherwise empty). Try shifting both positive and negative numbers, and look at what the computer puts in the leftmost digit in each case. Can you see why this might be useful? (Hint: in binary, shifting to the right by one place is equivalent to dividing by 2.) Now that you ve experimented with the shift and bitwise-and operators, take another look at the code you have been using. Do you understand how these lines print out a number in binary? for (i=31; i>=0; i--) printf("%d", ((a>>i)&1) ); 5
There are other bitwise operators. Left shift, a<<n, shifts bits to the left. Bitwise or, a b, sets each bit of the output to 1 if either that bit of a or that bit of b is 1. Bitwise exclusive or (sometimes called bitwise xor ), a^b, sets each bit of the output to 1 if either that bit of a or that bit of b is 1, but not if both are 1. 5. Floating point numbers When working with large or non-integer numbers, you have to use float variables. These are stored as patterns of bits, but in a different way than integers. What does a binary float number look like? Consider how we write decimal numbers in exponential form. The number 26.125 is written 2.6125 10 1. We can express numbers similarly in binary, except that the digits are all 0 s and 1 s, and the exponential is 2 n instead of 10 n. Here s an example. The decimal number 26.125 is 11010.001 in binary. Do you see why? The integer part, 26, is 11010. The fractional part, 0.125, is one eighth. With decimal numbers, the digits to the right of the decimal point are tenths, hundredths, thousandths, and so on. With binary numbers, the digits to the right of the decimal points are halves, quarters, eighths, and so on. Hence one eighth, in binary, is 0.001, and 26.125 is 11010.001. In exponential notation, the binary number 11010.001 is 1.1010001 2 4. The 4 in the exponent is because the decimal point is shifted over by four places. Notice that binary numbers written in exponential notation always have a one to the left of the decimal place. Think about it. Try a converting a few numbers into binary to convince yourself that this is true. Mathematica can convert numbers to binary; for example: BaseForm[26.125,2]. To ensure that the result is in exponential notation, write, for example, BaseForm[ScientificForm[26.125],2]. So, what does the binary number 1.1010001 2 4 look like in the computer s memory? It has three parts: The sign of the number, 0 for positive numbers and 1 for negative numbers. The exponent, 4 in this example. Exponents are stored as positive integers. In order to allow for negative exponents, some fixed number (called the bias ) is added to the exponent before it is stored. On our computers, 8 bits are used to store the exponent part of float variables, so the stored exponent numbers run from 0 to 255 (since 255, which is 11111111 in binary, is the largest possible 8-bit number). The bias is 127. So, in the example number, the exponent as stored in the computer memory is 4+127=131, which is 10000011 in binary. The fractional part, 1.1010001. This is called the mantissa. Since it always beings with a 1, there is no reason to actually store the 1 in computer memory. Just is the part after the decimal, 1010001 is stored. In our computers, the stored part of the mantissa is 23 bits long. Numbers with fewer than 23 digits are padded with zeros on the right side; this gives 1010001000000000000000 in our example. 6
The bits are stored in the order just listed: first the sign (1 bit), then the exponent (8 bits), then the mantissa (23 bits). Thus the number stored in memory looks like: 0 10000011 10100010000000000000000, where the three boxes store the sign, the exponent, and the mantissa. Of course, if you print out the number, you don t see boxes; it just looks like: 01000001110100010000000000000000. Writing out the bit pattern of a floating point number takes a little bit of trickery. The problem is that the bit-shift operator and the bitwise-and operator operate on integers, but not floating point numbers. What you need to do is to store a floating point number in memory, and then trick the computer into thinking it is an integer, so that you can use the binary-number-printing function you wrote earlier. Here is how you can do it: float a; int b;... set a to some real number... b = *(int *)&a;... print out the binary pattern in integer b... How does b = *(int *)&a work? (Don t worry if you don t follow this.) The variable a holds the floating point number. The expression &a refers to the location in memory (the address ) where that number is stored. This is called a pointer. (Confusingly, when & operates on two variables, such as a & b, it is the bitwise-or operator, but when it operates on a single variable, &a it returns a pointer to that variable. These two uses of & have nothing whatsoever to do with one another.) The value returned by &a is, specifically, a pointer to a float. The operator (int *) turns it into a pointer to an integer. (These are different variable types in C although in practice the number (int *)&a is probably identically equal to &a.) Finally, the * on the far left follows the pointer to an integer and retrieves the value stored there, treating it as an integer. This integer has the same bit pattern as the original floating point number. The use and manipulation of pointers is an important part of programming in C, but we won t use pointers much in this course. Write a program to print out the bit patterns of floating point numbers. Try it with 26.125. Try it with 26.125. Try it with 13.0625, which is 26.125 divided by two. Try it with a few other positive and negative numbers and see if the bit patterns make sense. What is the largest number you can store? What happens if you try to do arithmetic which is invalid, such as a = 1./0. or sqrt( 1), or if you multiply two large numbers together so that the result cannot fit in a 32-bit float? Try it, writing the result of the arithmetic both in the normal way, printf("%f"), and as a bit pattern. Any floating-point number with exponent all ones (11111111) represents either inf (infinity) or nan (not a number). Either of these means that an invalid arithmetic operation was attempted or that the result of an arithmetic operation was out of the range of float variables. 7
In addition to float variables, C has double and long double variables. On our computers, the format of doubles is the same as the format of floats, except that they have 11 bit exponents (with bias 1023) and 52 bit mantissas; along with the sign bit, this gives a total of 64 bits or 8 bytes. The format of long doubles is different. They are actually two doubles, stored one after another, one of which is much smaller than the other (by a factor of around 2 52 or 10 16 ). The total value of the long double variable is the sum of the two doubles. 6. Character variables So far we have been considering how computers store numbers. What if we want to store a word, or a sentence? How does the computer store hello? In computer-speak, this is a string. In C, a string is an array of characters, so we had better talk about characters first. There is a C data type called char, which holds exactly one character: char a; a = A ; printf("%c\n",a); A string is an array of characters. For example: #include <stdio.h> int main() { char a[4]; a[0] = A ; a[1] = B ; a[2] = C ; a[3] = \0 ; printf ("%s\n",a); The last character, \0, means end of string. Strings normally end with this character so that commands which use the string, such printf(), know where the string ends. You don t always have to refer to a string one character at a time. For example, the following declares the array a, fills it with ABC, and adds a \0 at the end. When the size of the array is not specified inside the square brackets, C will make the array just large enough to hold whatever string is specified ( ABC\0 in this instance.) char a[] = "ABC"; You can read a string from the keyboard: char a[80]; scanf("%s",a); // notice: no ampersand before the variable name Perhaps surprisingly, the following does not work. 8
char a[80]; a = "Hello world!"; // Does not work This bit of code doesn t work because C is not able to manipulate the entire array of characters in a single statement. Instead, there if a function copies one string into another string. The following copies Hello world! into string a: char a[80]; strcpy(a,"hello world!"); Characters are one byte long. (Try sizeof(char).) The program below will read in a string and type out the 8-bit binary patterns used to store each character, along with their numerical values in base 10. Try it out. Can you find patterns, e.g., what 8-bit numbers are used to represent A, B, C, and so on? What 8-bit numbers are used to represent a, b, and c,? What about digits 1, 2, etc.? Can you figure out how the program works? #include <stdio.h> int printbyte(int x) { int i; for (i=7; i>=0; i--) printf ("%d",((x>>i)&1)); int main() { char a[80]; int b; int i; printf("enter a string:\n"); scanf("%s",&a); i = 0; do { b = a[i]; printbyte(b); printf(" %3d %c\n",b,a[i]); i++; while (b!=0); 9
.... The floating-point numbers you saw today are in a format called the IEEE standard for binary floating-pint arithmetic or IEEE 754. (The IEEE is the Institute of Electrical and Electronics Engineers, a professional organization for people in these fields.) The Wikipedia has a good article on IEEE 754 formats and arithmetic. The character variables you saw in section 6 are in a format called ASCII (American Standard Code for Information Interchange). You can find tables of ASCII codes on the internet. 10