Naked C Lecture 6 File Operations and System Calls 20 August 2012
Libc and Linking Libc is the standard C library Provides most of the basic functionality that we've been using String functions, fork, printf Your code is automatically linked against libc by gcc You can link against other libraries using the -l option to gcc When using -l, specify the library name minus lib Example: linking against libtrace $ gcc -g -Wall -o tracesplit tracesplit.c -ltrace 2
Libc and System Calls Libc functions are provided for convenience Implement commonly used features in C It is possible to write your own versions of the libc functions How would you implement strlen(), for instance? Many tasks require the C library to invoke the kernel File I/O, memory allocation, networking, processes Kernel services can be invoked using system calls Wrapper functions in libc allow you to make system calls Wrappers have the same name as the system call 3
System Calls Application strlen malloc brk fopen Libc fork Syscall Wrappers open Kernel 4
File I/O in C We're going to look at two ways of doing file I/O in C File Handles, using libc functions File Descriptors, using syscall wrappers Both approaches have their place File handles are more convenient and easier to use You have encountered them already File descriptors are the only option for kernel-level programming Also file descriptors are used in other situations 5
File Handles FILE is a special type that describes a file in C The correct term is a file handle File handler functions in C either take or return a FILE * i.e. a pointer to a FILE structure The contents of the FILE structure are unimportant The FILE structure is opaque you will never touch it directly 6
File I/O The basics of file I/O in C Step 1 #include <stdio.h> Step 2 use fopen to open the file and get a FILE * Step 3 perform your I/O using functions like fgets, fprintf etc. Step 4 close your file using fclose Don't forget to check for errors as you go! 7
Binary Files Text files Human-readable but can be wasteful in terms of storage e.g. 12345678 is an int but requires 8 bytes to store Binary files Store data in the same format as it is in memory e.g. an int is always four bytes in a binary file Can be more compact Faster for your program to read no need to convert strings Write entire data structures to disk Just be careful with padding, use packed structs 8
Built-in File Pointers There are 3 file handles that are automatically opened stdin Input from the console stdout Output to the console stderr Error output to the console We used stdin back in Lecture 3 when using fgets #include <stdio.h> char input[256]; while (fgets(input, sizeof(input), stdin)!= NULL) { /* Note the lack of '\n' the string returned * by fgets() still has the newline on the end */ printf( You just typed: %s, input); } 9
Standard Error Normally stderr goes to the console just like stdout However, the streams can be redirected to separate files Use stdout for regular console output Use stderr for error messages and debug output 10
End of File EOF is a special value that indicates the end of a file Often returned by file handle functions in C In the GNU standard C library, EOF is equal to -1 11
Opening a file fopen opens a given file Takes 2 parameters: char *path and char *mode path is a string containing the path to the file to be opened mode is a string describing how the file should be opened fopen returns a FILE * for the file that was just opened Returns NULL if an error occurred 12
Opening a File Supported modes for fopen r open the file for reading r+ open the file for reading and writing w create a new file for writing Any existing file is overwritten w+ create a new file for reading and writing Any existing file is overwritten a open the file for appending All writes appear at the end of the file, if it previously existed a+ open the file for reading and appending All writes appear at the end of the file All reads begin from the start of the file 13
Error Handling perror prints the system error message Takes 1 argument: a string to prepend to the error message Writes a message to stdout that describes the most recent error Useful for explaining why a standard library function failed errno variable that stores the current error status #include <errno.h> to get access to errno errno is used by perror to print an appropriate message You may want different behaviour for specific errors To do this, you can switch based on errno 14
Error Handling Common values for errno have well-defined names EAGAIN resource unavailable, try again later EINTR function call interrupted by signal EBADF bad file descriptor EADDRINUSE address already in use EACCES permission denied ECONNREFUSED connection refused EINVAL invalid argument 15
File Opening Example Trying to open a file for writing Note the use of perror when checking for errors #include <stdio.h> FILE *f; f = fopen( /tmp/test, w ); if (f == NULL) { perror( example fopen ); } 16
Checking File State feof check if an EOF was encountered Takes one parameter: the FILE * to be checked Returns true if the EOF flag is set for that file ferror check if an error occurred Takes one parameter: the FILE * to be checked Returns true if the error flag is set for that file 17
Closing a File fclose closes the file associated with a FILE * Output is flushed before closing Do not try to reuse the file pointer after calling fclose on it Returns 0 on success and EOF on failure 18
Reading and Writing character string formatted binary read fgetc() fgets() fscanf() fread() write fputc() fputs() fprintf() fwrite() 19
Reading from a File fgetc read a single character from a file Takes one parameter: the FILE * to read from Returns an int (!) containing the character read Every character has a numeric value (ASCII value) For example, 'A' is equal to 65 'man ascii' will give you a full ASCII table The returned int is just the ASCII value of the character Returns EOF on end of file or error 20
Reading from a File Using fgetc to count B's Maybe not the best way to do this, but it's just an example :) int count_b(char *filename) { int c, count; FILE *f = fopen(filename, r ); if (!f) return 0; while ((c = fgetc(f))!= EOF) { if (c == 'b' c == 'B') { count ++; } } fclose(f); return count; } 21
Reading from a File fgets read a line from a file We've already seen fgets when learning about reading from stdin Three parameters: char *s, int size, FILE *stream s is the buffer to read the line into size is the size of the buffer stream is the file to read from Remember, the newline character is stored in the buffer 22
Reading from a File fscanf read formatted input from a file Just like scanf, except reading from a FILE * instead of stdin Same formatting rules apply Remember, you need to pass in pointers for your variables Returns the number of items successfully assigned to variables int tstamp, val, ret; char name[128]; FILE *fp = fopen( input.txt, r ); ret = fscanf(fp, %d %128s %d\n, &tstamp, name, &val); if (ret!= 3) { /* Should report error if ret is not EOF */ fclose(fp); return; } 23
Reading from a File fread read binary data from a file Takes four parameters: void *ptr, size_t size, size_t nmemb and FILE *stream stream is the file you wish to read from ptr points to the start of the memory to read into size describes the size of each data element nmemb describes the number of elements to read Returns the number of elements successfully read This will be zero in the event of both EOF and error Use feof and ferror to find out what happened 24
Reading from a File fread example Reading a series of doubles from a file double nums[100]; FILE *fp; int numread = 0; fp = fopen( doubles.bin, r ); if (fp) { numread = fread(nums, sizeof(double), 100, fp); if (numread!= 100) { if (ferror(fp)) { perror( fread ); } } } 25
Writing to a File fputc write a single character to a file Takes two parameters: int c and FILE *stream stream is obviously the file to write the character to c is the character to be written, expressed as an int (again) Returns an int representing the character written Unless there is an error, in which case it returns EOF int ch; FILE *f = fopen( /tmp/test, w ); for (ch = 'A'; ch <= 'Z'; ch++) { if (fputc(ch, f) == EOF) { perror( fputc ); break; } } fclose(f); 26
Writing to a File fputs write a string to a file Does NOT stop at a newline, only at '\0' Does NOT automatically put a newline at the end of the output Returns EOF on error, non-negative number on success char str[100]; /* I'm naughty and don't check for errors */ FILE *f = fopen( /tmp/test, w ); while(fgets(str, sizeof(str), stdin)!= NULL) { if (fputs(str, f) < 0) { perror( fputs ); break; } } fclose(f); 27
Writing to a File fprintf print a formatted string to a file Almost exactly the same as printf same formatting rules The first argument is a FILE * describing where the output should be written Returns the number of characters written successfully int foo = 100; char *str = Test ; /* I'm naughty and don't check for errors */ FILE *f = fopen( /tmp/test, w ); fprintf(f, fprintf is easy! %s %d\n, str, foo); fclose(f); /* fprintf is great for writing error/debug output */ fprintf(stderr, This is going to stderr ); 28
Writing to a File fwrite write binary data to a file Takes four parameters: void *ptr, size_t size, size_t nmemb and FILE *stream stream is the file you wish to write to ptr points to the start of the data you wish to write size describes the size of each data element nmemb describes the number of elements to write Returns the number of elements successfully written fwrite is ideal for writing buffer contents to a file Note the void pointer as a parameter! 29
Writing to a File Using fwrite to write a large buffer to a file In this example, I'm writing out an array of ints FILE *fp = NULL; int nums[1024]; size_t written; // pretend this is filled in fp = fopen( random.numbers, w ); written = fwrite(nums, sizeof(int), 1024, fp); if (written!= 1024) { perror( fwrite ); } fclose(fp); return(written == 1024); 30
Flushing a File Standard I/O is line buffered Output is not written to disk or the terminal immediately Wait for a newline, the buffer to fill or the file to be closed Saves the OS having to touch disk for every I/O function On occasion, you may wish to force all buffered output to be written fflush flush all buffered I/O immediately Takes one parameter the FILE * to be flushed fclose automatically flushes the stream before closing fputs( This string has no newline, stdout); /* The string won't be printed until I flush stdout */ fflush(stdout); 31
File I/O with System Calls A system call is a request to the kernel to do something File I/O is controlled by the kernel, so requires system calls File handle functions invoke the system calls for us System calls are expensive! OS must save state, take control, perform action, return control Be careful about making needless system calls Everything in Unix (and derivatives) is a file Understanding file I/O at the system level is rather important System call API for files also covers devices / hardware, pipes Lots of crossover with sockets too (networking) 32
File Descriptors First, we need to know about file descriptors A file descriptor (or fd) is an abstract representation of a file An fd can also represent sockets, pipes, directories and more! In POSIX, a file descriptor is an int and each fd is stored in a table Each process has its own fd table 0, 1 and 2 are fds for stdin, stdout and stderr, respectively The fd is an index used to find the data structure describing the open file in the kernel File offset the location where the next read or write operation will occur File status and access flags e.g. read only, non-blocking, append mode 33
Opening a File Descriptor open creates a new open file descriptor Two parameters: char *path and int flags path is the file to open flags describe how the file should be opened flags should be specified by bitwise ORing One of the flags must be either: O_RDONLY (read only) O_WRONLY (write only) O_RDWR (read and write) Returns the new file descriptor or -1 if an error occurred Use perror to find out what happened 34
Opening a File Descriptor Some other file descriptor flags O_CREAT: create the file if it doesn't already exist O_EXCL: ensure no file already exists with this name O_CREAT must also be set to use this This will prevent you from overwriting an existing file O_APPEND: enable append mode for the file If file exists, do not overwrite. Instead, append to the end of it O_NONBLOCK: enable non-blocking mode Read operations may return immediately if no data available Will set errno to EAGAIN O_NOATIME: read operations will not update the access time Useful for backup operations 35
Closing a File Descriptor close closes a file descriptor Takes one parameter: the fd to be closed Returns 0 on success, -1 if an error occurs 36
Opening a File Descriptor open and close example Open a file for appending but create it if it doesn't exit #include #include #include #include <sys/types.h> <sys/stat.h> <fcntl.h> <stdio.h> // // // // for for for for open open open perror int fd; fd = open( myfile.txt, O_WRONLY O_CREAT O_APPEND); if (fd == -1) { perror( open ); return -1; } close(fd); 37
Reading from a File Descriptor read reads a given number of bytes from a fd Three parameters: int fd, void *buf, size_t count fd is the file descriptor to read from buf is a pointer to memory that data will be read into count is the number of bytes to read Returns the number of bytes read Returns -1 if an error occurs Returns 0 if the end of file is reached (not EOF) It is possible for read to return less bytes than you asked for e.g. if your read is interrupted by a signal 38
Reading from a File Descriptor Reading up to 100 bytes from myfile.txt #include <unistd.h> int fd, buf_read; char buf[100]; // for read fd = open( myfile.txt, O_RDONLY); if (fd == -1) { perror( open ); return -1; } buf_read = read(fd, buf, 100); if (buf_read < 0) { perror( read ); return -1; } close(fd); 39
Writing to a File Descriptor write write contents of memory to a file descriptor Three params: int fd, void *buf and size_t count These have the same meaning as read, except this will copy from memory to the file descriptor rather than the other way around. Returns the number of bytes written Returns -1 on error Again, write may return less than count It is up to you to ensure any subsequent writes begin from the right place! 40
Writing to a File Descriptor Using write to display a string on the terminal char str[] = STOP, COLLABORATE AND LISTEN!\n int written = 0, ret; char *ptr = str; int len = strlen(str); while (written < len) { ret = write(1, ptr, len written); if (ret < 0) { perror( write ); break; } written += ret; ptr += ret; } 41
Changing File Offset lseek repositions the file offset for a file descriptor Takes 3 parameters: int fd, off_t newoff, int whence fd is the file descriptor to reposition newoff quantifies the change to be made off_t is another special type used for file offsets whence describes how to adjust the file offset SEEK_SET: directly set the offset to newoff SEEK_CUR: increment the offset by newoff SEEK_END: set the offset to newoff bytes past the file end Returns the new file offset if an error occurs, returns -1 42
More File Descriptor System Calls fsync flushes a file descriptor Works just like fflush, except it takes an fd instead of a FILE * Returns 0 on success, -1 on error dup duplicates a file descriptor Takes the file descriptor to copy as a parameter Returns the next available file descriptor Both descriptors now refer to the same file The duplicate has the same file offset, modes etc. If the file offset is changed in one fd, it is also changed in the other 43
Descriptors to Handles and Back Again fdopen upgrade a file descriptor to a file handle Works exactly like fopen, except takes a fd instead of a filename The mode parameter must be compatible with the mode of the fd Do NOT call close on the file descriptor after doing this! fileno get the file descriptor for a file handle Takes one parameter: a FILE * Returns the file descriptor associated with that file Returns -1 if the file handle is bad 44
File Handles vs File Descriptors File handles operate at a higher level to file descriptors A FILE * will contain a file descriptor File handles offer more options when reading/writing Writing to a file handle: fwrite, fputs, fputc, fprintf Writing to a file descriptor: write File handle code only works on files File descriptor code can be easily adapted to sockets, pipes etc. File handle functions are not available in the kernel You have to use the lower-level system calls then! 45
File Handles vs File Descriptors Use file handles in userspace applications Unless you want a file, socket, pipe, etc. to be interchangeable Use file descriptors when working in or near the kernel Kernel modules and device drivers Whatever you do, don't mix and match for the same file! e.g. don't call both read and fread on the same file 46
More System Calls Heap memory allocation using brk and sbrk brk and sbrk increase the size of the program's heap These system calls are invoked when you call malloc malloc optimises the process Userspace programs should never call brk or sbrk directly 47
More System Calls Socket programming Networking: connections, sending and receiving messages Uses file descriptors but slightly more complicated Will cover in more detail in a future lecture Unlike file I/O, there is no nice socket handle API 48
More System Calls Processes We've already encountered many of the libc wrappers fork, waitpid, execve, exit All of the various exec* functions that we saw invoke execve Each took slightly different input and then converted it into parameters suitable for calling execve 49
More System Calls File system operations chdir changes the current working directory getcwd gets the current working directory chown changes the owner of a file mkdir will create a directory rename will change the name or location of a file There are many, many more... Most standard Unix tools are really easy to implement! There's a matching syscall wrapper function for most of them They even usually have the same name! 50
strace strace see what system calls your program is using strace is run on the command line, just like gdb Prints each system call, its arguments and the return value You can attach strace to a running process Allows you to see what a hung process is doing Running a program with strace -f tells strace to follow all forks $ strace -f./myprog Attaching strace to an existing process In this example, the process has the pid 3443 $ strace -f --pid=3443 51
Recap System Calls Operations that require the kernel are done via system calls Userspace programs do not invoke system calls directly libc implements wrapper functions for system calls Not all libc functions are wrappers Some require no system calls at all Others may call syscall wrappers 52
Recap File I/O using file handles Abstraction of system calls that can be used for file operations only Most userspace programs should use file handles File handles use a special structure: a FILE * Many options for reading and writing to files fprintf, fgets, fscanf, fputs, fputc, fgetc, fread, fwrite Not usable at the kernel level 53
Recap File I/O using file descriptors Using the system call wrappers directly Uses a file descriptor, which is just an integer representing a file File descriptors also represent sockets, pipes and directories No nice API functions for formatting input or output Only read and write available Will need to be familiar with fd operations to do I/O in kernel space 54