A brief introduction to C programming for Java programmers Sven Gestegård Robertz <sven.robertz@cs.lth.se> September 2017 There are many similarities between Java and C. The syntax in Java is basically C syntax which has been adapted and extended. Thus most of the arithmetic operations and control flow statements (like if, while, for, etc.) are the same. However, despite the similar syntax the languages are quite different, and this document outlines some important differences between Java and C. This is not a C programming tutorial, and is by no means a complete list of important differences between C and Java, but a set of pointers to topics that a Java programmer may find difficult or confusing when first encountering C. C is not object-oriented. There are no classes in C, the top level building block is functions. However, C does allow grouping variables in a struct, which is like a Java class with public attributes and without methods. Operations on that struct can then be expressed as functions that take a pointer to the struct as an argument (much like the implicit this parameter to Java methods). The main thing that has no direct counterpart in C is inheritance and polymorphism. A pointer is a variable containing the address of a variable. In C, the use of pointers is a common source of confusion, starting with the syntax where the prefix operator * is used in both declarations and expressions, with different meaning. In declarations, prefix * means is pointer to : int *x; // x is a pointer -to - int In expressions, prefix * means contents of (dereference): int i = * x; // i gets the value that x points to and prefix & means address of (getting a pointer to a variable): int * y = & i; // y points to the variable i Note that *y is an int variable, as y is a pointer to int. Thus the statement *y = 5; means that the variable that y points to (in this case i) is assigned 5. C has no runtime safety net. In Java, the JVM performs runtime checks to detect or prevent certain types of errors. For instance, if you attempt to access oustide an array (e.g., trying to write the tenth element in an array of size 5, like int a[5], you will get an ArrayIndexOutOfBoundsException. In C, you get undefined behaviour, which often means that the program will crash due to memory corruption often at a later point in time. The same applies to type casts: In Java, if you attempt casting a variable to an incompatible type you get 1
a ClassCastException. In C, the programmer is responsible for that the type cast makes sense. If you, for instance, cast a pointer to a larger data type and assign through it, you will probably get memory corruption and undefined behaviour. The following snippet illustrates this: char c; int *p = &c; *p = 0; Here, the problem is that the size of the variable c is one byte, but the size of int is 4 bytes. Therefore, when *p = 0 is executed, four bytes are written to a variable of size 1, overwriting three bytes of memory adjacent to the variable c. If you enable warnings, the compiler will give a warning: initialization from incompatible pointer type. In C, function arguments can be passed by value or by reference. In Java, function arguments of primitive types are always passed by value, and objects (class instances) are always passed by reference. In C, you can choose (with the exception that arrays cannot be passed by value 1 ). That means that you can use pass-by-reference (pointer) for any data type. That is commonly used in the system libraries, where functions like ssize_t read ( int fildes, void * buf, size_t nbyte ); take an output parameter (in this case buf) that is used to provide the result to the caller. Example: void f_by_value ( int x) printf (" f_by_value : x = %d\n",x); x += 10; printf (" f_by_value : x = %d\n",x); void f_by_reference ( int * x) printf (" f_by_reference :x = %d\n",*x); *x += 10; printf (" f_by_reference : x = %d\n",*x); void example () int x = 10; printf (" example : x=%d\n",x); f_by_value (x); printf (" example : x=%d\n",x); f_by_reference (& x); printf (" example : x=%d\n",x); Here, calling f_by_value() does not change the value of x in the calling function, whereas calling f_by_reference() does. 1 If you really want to pass an array by value, you can define a struct containing an array, as structs can be passed by value. 2
C does not have function overloading. You cannot have two functions with the same name. Also, the namespace is flat, so all function names must be unique. Beware of name clashes with functions from the standard library, as common names like open, close, exit, shutdown, etc., exist there. A common way of naming funcions is to prefix their names with e.g., a module name (e.g., mydevice_init() instead of just init()). C arrays decay to a pointer to the first element. You often read that a C array is simply a pointer to the first element. While true in many situations, there is an important distinction. When you declare an array variable (like int a[5];) you allocate space (for a local variable, in the current function) for an array of five ints (i.e., an object whose size is 5*sizeof(int)) and bind the name a to it. If you declare int *p; you simply allocate space for the pointer, and no array. Thus, when talking about objects, or variables, an array and a pointer are different types, and in the given example, a is an array of five ints, and p is a pointer to int. However, if you use the name of an array variable (a) as an argument to a function taking a pointer (like void f(int*)), in an expression like f(a);, then what gets passed to the function is not the entire array, but a pointer to the first element (like had you written f(&(a[0]));. This is known as array decay. Here, the syntax of C may be a bit confusing, as an array parameter to a function can be written either int a[] or int *a. This is just syntactic sugar, and the function prototypes void f(int a[]); and void f(int *a); are equivalent: in both cases, a is a pointer to int. Adding to the confusion, the length of an array can be omitted in an array declaration where the array is also initialized. For instance, the declaration char s[] = "Hello"; is equivalent to char s[6] = "Hello"; as the compiler uses the length from the supplied initial value. In this case, the variable char s[] is an array of chars, and not just a pointer to char. C arrays contain no length information. All functions that take an array as a parameter just get a pointer to the first element (array decay). Thus, there is no built-in information about the length of the array, and such functions also need a length parameter. It is the responsibility of the programmer that the length passed is correct. A word of warning: C array variables have no length information, but in the scope where an array variable is defined, the compiler does know the length of the array. This information is, however, only available in the scope where the array is declared (typically a function). Sometimes you see code like the following: int a[] =...; // some initializer int i; for (i = 0; i < sizeof (a)/ sizeof (*a); ++i) // do something with a[ i ]; This idiom is dangerous if you are not careful with when it works as intended. As stated above, sizeof applied to an array variable will return the size (in bytes) of the array, and thus the expression sizeof(a)/sizeof(*a) is the number of elements in a (the size of the entire array divided by the size of one element). 3
However, if a were a pointer, sizeof(a) would return the size of a pointer, and not the size of whatever it points to (as that cannot be determined without run-time information). Thus, the above idiom of using sizeof to get the length of an array only works in the same scope as the declaration of the array variable. To illustrate the pitfall, the following code void f( char a []) printf ("f: The length of %s is %lu\n", a, sizeof (a)); int main () char s[] = " Hello, world!"; will give the output printf (" main : The length of %s is %lu\n", s, sizeof (s)); f(s); main : The length of Hello, world! is 14 f: The length of Hello, world! is 8 as s in example() is an array and thus sizeof gives the actual size, but in f(), a is a pointer and sizeof gives the size of a pointer (in this case 64 bits = 8 bytes). With warnings enabled, the compiler will give the (somewhat cryptic) sizeof on array function parameter will return size of char * instead of char [] C has pointer arithmetic. In C, a pointer is a variable containing the address of another variable of a specified type. C pointers have the arithmetic operators, and the most common use case is that array indexing is tightly connected to pointer arithmetic. Consider the following code, which fills an array with zeroes: int a [10]; int i; for ( i = 0; i < 10; ++i) a[ i] = 0; Here, the expression a[i] is equivalent to *(a+i) (i.e., the contents of the ith element of a. Thus, the assignment to array element i can also be written as *(a+i) = 0;. This use is mostly cryptical, but a common use-case of this is when passing arrays to other functions. E.g., assume we have char buf[bufsize]; and we want to read some data from a file descriptor fd into the buffer buf, starting at position pos (for instance, to continue reading a message after reading the first pos bytes of it). We can do that with the statement read (fd, buf +pos, BUFSIZE - pos ); and here, the expression buf+pos is, arguably, clearer than the equivalent expression &(buf[pos]). In order for this to work, adding a number k to a pointer means adding k times the size of the pointed-to type to the address. For example, if the array a in the above example is stored at address 0x10000, and sizeof(int) is 4 bytes, the value of the expression a+2 would be 0x10008. This means that the expression a[i] is equivalent to *(a+i). Incidentally, that means that you can write array indexing backwards : the expressions a[5] and 5[a] are equivalent (both meaning *(a+5)) although one obviously shouldn t use the latter. 4
C strings are null-terminated character arrays. In Java, the String class contains information about the length of a string and methods for copying, assigning, concatenating, comparing, etc., strings. In C, a string is simply an array of characters terminated with a null (i.e., the integer value zero) so for instance the string "hello" is an array of 6 chars (the five letters plus a null). Copying, concatenating and comparing strings is done using standard library functions (e.g. strncpy, strncat and strncmp). Another subtle detail about strings is that you can define a string variable in two ways: const char * s1 = " hello "; char s2 [] = " hello "; where s1 cannot be modified (as it is just a pointer pointing to a string literal which may be stored in read-only memory) whereas s2 can (as it is a char array allocated as a local variable and initialized with the string literal). Sometimes one sees char *s = "hello"; (without the const) but this is wrong, and changing the string pointed to by s is undefined behaviour. C has no boolean type. 2 Instead any integral value can be used in a boolean context. Here zero is interpreted as false, and non-zero as true. This means that any expression can be used in a boolean context. A common mistake is to write an assignment (which is an expression) instead of a comparison. For instance, the snippet int x = 5; if(x = 0) printf ("x is zero \n"); else printf ("x is %d\n", x; will print x is 0, as x = 0 is an assigment and its value is zero, which is interpreted as false in the boolean context and thus the else branch is chosen. This mistake is quite common, so if you turn on compiler warnings (e.g., with the option -Wall) you will get a warning if you use assignment in a boolean context. C has both signed and unsigned integer types. Unlike Java, which has only signed integer types, C has both signed and unsigned types. For instance, the type signed char typically has the value range [ 128, 127], whereas unsigned char has the range [0, 255]. For normal integer arithmetic this is of little importance, and just use the signed types (like int) to represent numbers. On the other hand, if you want to represent, or manipulate, bit patterns, using unsigned types is recommended, as that avoids problems related to sign extension. Also, with signed types the behaviour of the right shift operator (>>) when applied to a negative number (i.e., if a zero or a one is shifted in) is implementation defined, wheras right shift of an unsigned integer always shifts in zeros. 2 The type bool was introduced in the C99 standard, but that is still regarded as a new standard and still not directly supported by many C compilers. And what is said here about integers as boolean values is true also in C99. 5
C has no automatic memory management. In Java, you allocated objects with new, and when an object cannot be reached by the program the memory used by the object is automatically reclaimed by the garbage collector. In C, the programmer is responsible for freeing memory when it is no longer used, and failure to do so will cause a memory leak, and eventually lead to an out-ofmemory error. For this reason, the recommendation is to use stack allocation (i.e., use local variables in functions) whenever possible. Make sure the lifetime of a pointer is longer than the object it points to. Do not return pointers to local variables C has no exception mechanism. Error-handling has to be done explicitly. Often, the return value of functions is used to indicate success or failure. For instance, a function like read() returns the number of characters read on success, or a negative value on failure. Functions returning pointers often return null to indicate failure. A third option is to just return an error code: often 0 (i.e., false if no error) or non-zero (true) on error. The typical use of such a function is if( my_function_that_may_fail ()) // handle error or, if the user needs to handle different error codes differently int result = my_function_that_may_fail (); if( result == ERROR1 ) // handle error type 1 else if( result == ERROR2 ) // handle error type 2 Turn on compiler warnings. C has many pitfalls and some legal constructions are often not what the programmer intended (a common example is writing if(a = 0) instead of if(a == 0)). To get as much help as possible from the compiler it is recommended to enable warnings, and often also to make the compiler threat warnings as errors (with -Werror). With gcc and clang, the following set of options is a reasonable default: - Wall - Werror - pedantic - pedantic - errors 6