CAAM 420 Notes Chapter 2: The C Programming Language W. Symes and T. Warburton October 2, 2013 22 Stack Overflow Just for warmup, here is a remarkable memory bust: 1 / Author : WWS 3 Purpose : i l l u s t r a t e s t a c k o v e r f l o w most systems have 5 s t r i c t l i m i t to the s i z e o f the stack, which i n c l u d e s the region in which automatic v a r i a b l e memory 7 r e s i d e s. / 9 // l e n g t h o f a r e a l l y long array 11 #define N 1600000 // comment t h i s out to see s t a c k o v e r f l o w 13 //#d e f i n e DYN 15 #include <s t d i o. h> #ifdef DYN 17 #include <s t d l i b. h> #endif 19 int main ( ) { 21 int i ; 23 // the s c a n f pause t r i c k s c a n f ( %d,& i ) ; 25 // dynamic a l l o c a t i o n... 27 #ifdef DYN double x = ( double ) malloc (N sizeof ( double ) ) ; 29 #else // vs s t a t i c a l l o c a t i o n o f t he same workspace 31 double x [N ] ; #endif 33 // now t r y to a c c e s s for ( i =0; i<n; i ++) x [ i ] = 0. 0 ; 35 1
it s called a stack overflow. The stack - the part of machine memory where program code resides, along with a data region for automatic memory - has a limited size. The system utility ulimit, run on my MacBook Pro, shows $ ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) 6144 file size (blocks, -f) unlimited max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files (-n) 256 pipe size (512 bytes, -p) 1 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 266 virtual memory (kbytes, -v) unlimited (generally if you use tcsh, limit will do the same thing as ulimit -a under bash - see this web site for info). Thus much more than 8 MB of data allocated is likely to cause trouble. As it turns out, to cause real mayhem you need to allocate more than that - 1.6 x 8 =12.8 MB - but that does it. Notice that not even the first instruction gets executed - the command cannot even load! 23 File I/O Up to this point, the only method I ve shown you for moving data into our out of files is via input or output redirection. That s not an adequate toolset for many scientific programming tasks in which file data must be manipulated during runtime. K&R cover this topic in section 7.5. The structural model underlying file i/o is that of a linear memory segment: a file is treated as an array of bytes, with the offset (distance in bytes) from various positions within the file (begin, end, current) measuring position of a byte within the file. The critical steps in file i/o, and the functions that perform them, are: fopen: fseek, ftell: fscanf, fprintf: fread, fwrite: fflush, fclose: opening the file - making it accessible to the process changing, reporting position (of the next byte to be read/written) within the file read, write ascii data read, write binary data ensure that data is physically on the disk or other media, detach the file from the running process Before describing the basic i/o facilities, I should mention that these functions (all names begin with f ) form the standard i/o library. Some of them are described in Ch. 7 of K&R, several are not. They are by far the most useful i/o functions in C. Another, low-level i/o library exists, with some of the same basic functionality and function names without the f prefix - this other library is described in section 8.2. For various reasons I won t go into, the standard libary is usually preferable. The standard i/o functions may be used in a lot of different ways, which are hard to remember - so it s nice that the man page on fopen lays them out clearly: if you forget, just enter man fopen, for example. A few remarks: 1. fopen: the return value of this function is a pointer to a special kind of structure (later this week!) called a file pointer (older terminology) or stream (newer, C++-motivated terminology), of type FILE *. Note: there is never any need to deal with FILE objects directly - only through pointers to them. Every other library function will use the FILE * value returned by fopen. If fopen fails (to make the file available, or to open it), it returns the zero pointer, which has the macro definition NULL and can be tested as a boolean. 2
fopen requires two arguments: (1) the file name (a string) and (2) a permission string, which may be one of r, r+, w, w+, and a couple of other not-so-useful options. r : non-destructive, opens file for reading only and places stream (file pointer) at beginning - start at the beginning. If the file doesn t exist, this option fails (returns a NULL pointer). Use this option to read (only) a file that should exist. r+ : non-destructive, opens file for reading and writing and places stream at beginning. Also fails if file does not exist. Use this option to access a existing file that you may also want to alter (write). w : destructive, truncates the file (i.e. trashes its contents) if it exists, creates it if it does not, positions stream at start. Use this option to create a new file or overwrite an existing one; for output only. w+ : destructive, truncates the file if it exists, creates it if it does not, positions stream at start, permits both reads and writes. Use this option to create a new file or overwrite an existing one, when you may also need to read from it after having written to it. 2. fseek, ftell: fseek has three args: a file pointer, a long integer offset, and a base position. Three base positions are macro-ized, and are almost always the appropriate ones (usually the first): (1) begin-offile = SEEK SET, (2) end-of-file = SEEK END, and (3) current position = SEEK END. It returns 0 on success. ftell takes one argument, the file pointer, and returns the offset (in bytes, of coures, as an int). 3. fscanf, fprintf: well-described in K&R. Same as scanf and printf except that they read/write from/to an open file, and take the FILE * returned by fopen as first argument. 4. fread, fwrite: binary (i.e. non-human-readable) i/o. The great advantages of binary i/o are: (1) it s faster per unit information, because less translation is required, and (2) the files are smaller. Used to read into arrays: if T * x is initialized as an array of type T, either automatically or via a call to malloc, with length at least n, and fp is a FILE * returned from fopen with appropriate permssions, then nr=fread(x,sizeof(t),n,fp); attempts to read enough bytes for n words of type T into x, and reports that it actually succeeded in reading nr words. Similarly. nr=fwrite(x,sizeof(t),n,fp); attempts to write n words of type T from x into the file attached to FILE * fp. If nr is less than n on return, something went wrong. On return from either of these functions, the stream position (fp) is whereever the next byte to be read/written is located. So you can fill a file up by writing a loop over calls to fwrite, or read it from start to finish by looping over fread, assuming that the natural size of array to read is likely to be less than the file size. 5. fflush, fclose: fclose(fp) undoes what fp=fopen(...) did - the file is no longer associated with the file pointer fp. Because standard i/o is bufferred not written word-by-word to disk, but stored somewhere until the OS finds it convenient to access the disk controller - it s sometimes necessary to flush the buffers, and that s what fflush does. For example, if you are writing error messages and want to make sure you see them even if the program crashes, follow fprintf(fp,...) with fflush(fp). A rewrite of our previous example to write the same output, but to a file called hist.dat, illustrates some of these points: 3
1 #include <s t d l i b. h> #include <s t d i o. h> 3 #include <math. h> 5 int main ( int argc, char argv ) { 7 i f ( ( argc!=4)&&( argc!=5)) { p r i n t f ( Histogram g e n e r a t o r for sample mean o f C std l i b r a r y f u n c t i o n \n ) ; 9 p r i n t f ( random ( ), s c a l e d to produce pseudorandom numbers in the range \n ) ; p r i n t f ( [ 1, 1 ]. Computes mean over a sample o f p r e s c r i b e d s i z e, and\n ) ; 11 p r i n t f ( samples the r e s u l t i n g d i s t r i b u t i o n o f the mean over a p r e s c r i b e d \n ) ; p r i n t f ( number o f t r i a l s. Means s c a l e d by square r o o t o f number o f t r i a l s, \ n ) ; 13 p r i n t f ( so that l i m i t o f i n f i n i t e l y many t r i a l s i s Gaussian (CLT). \n ) ; p r i n t f ( Counts means in each o f a p r e s c r i b e d number o f equal s i z e d b i n s \n ) ; 15 p r i n t f ( ( s u b i n t e r v a l s o f [ 1, 1 ] ). Executes t h i s procedure a p r e s c r i b e d number\n ) ; p r i n t f ( o f times [ default =1], and p r i n t s the r e s u l t i n g histogram r e a l i z a t i o n s \n ) ; 17 p r i n t f ( to the f i l e \ h i s t. dat \ in the form o f x y p a i r s, x being bin c e n t e r and y being \n p r i n t f ( r e l a t i v e bin count, one p a i r to a l i n e, with two blank l i n e s \n ) ; 19 p r i n t f ( between s u c c e s s i v e r e a l i z a t i o n s. \ n\n ) ; p r i n t f ( usage : [ prog name ] [ sample s i z e ] [ number o f t r i a l s ] [ number o f histogram b i n s ] [ o p t i o 21 e x i t ( 1 ) ; 23 / 25 D e c l a r a t i o n s / 27 i n t nsamp ; // s i z e o f each sample i n t n t r i a l ; // number o f samples to g e n e r a t e 29 i n t nbin ; // number o f b i n s i n histogram i n t n r e a l ; // number o f r e a l i z a t i o n s 31 i n t i t r i a l ; // sample counter i n t isamp ; // in sample counter 33 i n t i b i n ; // bin counter i n t i r e a l ; // r e a l i z a t i o n counter 35 f l o a t mean ; // array o f n t r i a l means f l o a t h i s t ; // array o f nbin counts 37 f l o a t l b i n ; // workspace f o r l e f t endpoint o f bin f l o a t dbin ; // bin width 39 f l o a t rmax ; // max output o f random (2ˆ31 1), as f l o a t f l o a t s c a l e ; // s c a l e f a c t o r to approximate c e n t r a l tendency 41 FILE fp ; // output stream 43 / 45 / Statements 47 // s e t random seed by p r o c e s s number srandom ( g e t p i d ( ) ) ; 49 // compute max output o f random = 2ˆ31 1 ( clunky! ) 51 rmax=1; f o r ( i b i n =0; i b i n <31; i b i n++) rmax =2.0 f ; 53 rmax =1.0 f ; 55 // convert i n t p u t s to i n t e g e r s nsamp=a t o i ( argv [ 1 ] ) ; 57 n t r i a l=a t o i ( argv [ 2 ] ) ; 4
59 nbin=a t o i ( argv [ 3 ] ) ; // d e f a u l t 61 n r e a l =1; i f ( argc==5) n r e a l=a t o i ( argv [ 4 ] ) ; 63 // s a n i t y t e s t s 65 i f ( nsamp<0) { p r i n t f ( Error : sample N f o r each mean must be > 0, was %d\n, nsamp ) ; 67 e x i t ( 1 ) ; 69 i f ( n t r i a l <0) { p r i n t f ( Error : number o f means must be > 0, was %d\n, n t r i a l ) ; 71 e x i t ( 1 ) ; 73 i f ( nbin <0) { p r i n t f ( Error : number o f b i n s must be > 0, was %d\n, nbin ) ; 75 e x i t ( 1 ) ; 77 i f ( nreal <0) { p r i n t f ( Error : number histogram r e a l i z a t i o n s must be > 0, was %d\n, n r e a l ) ; 79 e x i t ( 1 ) ; 81 // s e t bin l e f t endpoint, width 83 l b i n = 1.0 f ; dbin =2.0 f / ( ( f l o a t ) nbin ) ; 85 // a l l o c a t e workspace f o r histogram, sample mean v e c t o r s 87 h i s t = ( f l o a t ) malloc ( nbin s i z e o f ( f l o a t ) ) ; i f (! h i s t ) { 89 p r i n t f ( Error : f a i l e d to a l l o c a t e memory f o r histogram \n ) ; e x i t ( 1 ) ; 91 mean = ( f l o a t ) malloc ( n t r i a l s i z e o f ( f l o a t ) ) ; 93 i f (! mean) { p r i n t f ( Error : f a i l e d to a l l o c a t e memory f o r means\n ) ; 95 e x i t ( 1 ) ; 97 // open f i l e 99 i f (! ( fp=fopen ( h i s t. dat, w ) ) ) { p r i n t f ( Error : f a i l e d to open output f i l e \ h i s t. dat \ \n ) ; 101 e x i t ( 1 ) ; 103 / 105 r e a l i z a t i o n loop / 107 109 f o r ( i r e a l =0; i r e a l <n r e a l ; i r e a l ++) { // i n i t i a l i z e means, histogram 111 f o r ( i b i n =0; i b i n <nbin ; i b i n++) h i s t [ i b i n ]=0.0 f ; f o r ( i t r i a l =0; i t r i a l <n t r i a l ; i t r i a l ++) mean [ i t r i a l ]=0.0 f ; 113 // g e n e r a t e sample o f n t r i a l sums o f nsamp random nos. 115 f o r ( i t r i a l =0; i t r i a l <n t r i a l ; i t r i a l ++) 5
f o r ( isamp =0; isamp<nsamp ; isamp++) 117 mean [ i t r i a l ]+=2.0 f ( ( f l o a t ) ( random ( ) ) / rmax) 1.0 f ; 119 // s c a l e by s q r t ( nsamp ) per c e n t r a l l i m i t theorem combines // d i v i s i o n by nsamp to convert sums to means, and m u l t i p l i c a t i o n 121 // by s q r t ( nsamp ) s c a l e =1.0 f / s q r t ( ( f l o a t ) nsamp ) ; 123 f o r ( i t r i a l =0; i t r i a l <n t r i a l ; i t r i a l ++) mean [ i t r i a l ] = s c a l e ; 125 // bin means to compute histogram f o r ( i t r i a l =0; i t r i a l <n t r i a l ; i t r i a l ++) { 127 l b i n = 1.0 f ; f o r ( i b i n =0; i b i n <nbin ; i b i n++) { 129 i f ( ( l b i n <=mean [ i t r i a l ] ) && ( l b i n+dbin>mean [ i t r i a l ] ) ) h i s t [ i b i n ]+=1.0 f / ( ( f l o a t ) n t r i a l ) ; 131 l b i n+=dbin ; 133 135 // output to t e r m i n a l f i r s t element o f each p a i r i s c e n t e r o f bin l b i n = 1.0 f +0.5 dbin ; 137 f o r ( i b i n =0; i b i n <nbin ; i b i n++) f p r i n t f ( fp, % 1 0. 2 e %10.2 e\n, l b i n+i b i n dbin, h i s t [ i b i n ] ) ; 139 // tack on two blank l i n e s f o r p l o t t i n g purposes f p r i n t f ( fp, \ n\n ) ; 141 143 // c l e a n up 145 f r e e ( h i s t ) ; f r e e (mean ) ; 147 149 151 f c l o s e ( fp ) ; r e t u r n 0 ; It also shows you how to put double-quotes in a formatted write statement. 23.1 The three standard files: The C i/o library defines three standard files, which are always open: stdin = keyboard input - can read from it, but not write to it ( r permission, in effect); stdout = terminal output - can write to it, but not read from it ( w ) - line buffered, that is, characters stored until a newline is encountered, then dumped to screen - you can lose output if for some reason the command does not complete (an error causes the program to abort, for example); stderr = unbuffered terminal output - every character goes straight through, useful for error messages So fprintf(stdout,...) is equivalent to printf(...), fscanf(stdin,...) to scanf(...). For redirection, (...) (file) captures stdout to the file, (...) & (file) captures stdout and stderr to the file. Since many apps (the gcc compiler, for example) write error messages to stderr, you should use the second form to capture output from them. 6
23.2 Long files: For files of length greater than the max int (approx. 4 GB for typical 4 byte ints), fseek and ftell are inadequate positioning tools. The 99 standard introduced the off t data type, which is identical to the largest integer supported by the compiler - for typical 64 bit machines, that s 8 byte long longs, adequate to represent any 64 bit address. The standard i/o library provides extensions fseeko and ftello, which work in exactly the same way as fseek and ftell except with off ts instead of ints. 24 Structures Arrays consist of some number of objects of identical type. The array data structure thus cannot accomodate an aggregation of objects of different types, which may be the natural way to express a data structure concept. For example, we have already remarked that arrays are unsatisfactory as data containers for vector data, as they do not know their own length : as a function argument, an array (however allocated) is merely a pointer to its first member. A more satisfactory data structure would combine a integer for length, and a pointer to a real type, for the array data with the presumption that the int is the length of the data array stored at the pointer (an interesting question: how would you enforce this relationship?). C provides a struct type to accommodate this type of data structure. Struct syntax takes this form: struct [structname] { [type1] [member1]; [type1] [member1]; [type1] [member1];... You access the members via the member access operator, denoted.. Thus for example in struct mystruct { char x; int y; float * z;... struct mystruct v; The members x, y, and z would be accessed as v.x; v.y; v.z;... 7
A full code example, that also uses the typedef command to define a variable type from a C struct: 2 #include <s t d i o. h> 4 typedef struct { 6 double x ; double y ; 8 10 point ; 12 main ( ) { 14 point p ; 16 p. x = 1. 2 ; p. y = 3. 2 ; 18 20 p r i n t f ( p=(%f,% f )\ n, p. x, p. y ) ; The member access operator has high precedence - it s evaluated before anything else, so it s sometimes necessary to use parentheses to force the order of evaluation you want. K&R give a very good discussion of structures in Ch. 6 sections 1-4, and it is required reading for this week. I provide an example relevant to scientific programming, which will be discussed in class. 8