File and Console I/O CS449 Fall 2017
What is a Unix(or Linux) File? Narrow sense: a resource provided by OS for storing informalon based on some kind of durable storage (e.g. HDD, SSD, DVD, ) Wide (UNIX) sense: an interface provided by OS for not only storing but also exchanging info Between programs and the OS Between two programs Interface uses a stream of bytes abstraclon Interfaces are organized as nodes in a file system
What is a Unix(or Linux) File? Examples of files In Unix: Storage resource: TradiLonal files, directories, links OS è Program communicalon interface Devices (keyboards, network cards, printers) Device drivers expose informalon through files Standard localon is under the /dev/ directory Processes (instances of running programs) Info and control of processes is provided through files Standard localon is under the /proc/ directory Program è Program communicalon interface Interprocess communicalon (pipes, shared memory, sockets) Files for IPC can be placed in any agreed upon place
UNIX File Interface A file, in abstract, is a stream of bytes OS Interface consists of five system calls: open(): opens a file and returns a file descriptor File descriptor: index into an OS array called open file table close(): closes file descriptor read(): reads current offset through file descriptor write(): writes current offset through file descriptor lseek(): changes current offset in file Some files do not support certain operalons E.g. a terminal device does not support lseek E.g. a DVD device does not support write
C Standard Library Wrappers C Standard Library wraps file system calls in library funclons 1. For portability across mullple systems 2. To provide more features (buffering, forma_ng) Prin` implements forma_ng on top of write() system call Prin` does line buffering by default when prinlng to screen Buffering controlled using setbuf() calls (remember those?) Works on FILE * instead of file descriptor FILE is a library data structure that abstracts a file FILE data structure contains file descriptor, arrays for internal buffers, flags for buffering mode etc.
Anatomy of UNIX File System Source: M. Tim Jones, Anatomy of the Linux kernel, 06 June 2007 Apps can use either system calls or C library calls to access files Virtual file system implements the 5 system calls for file access Relies on a real file system (ext3, reiser) or pseudo file system (proc, dev) to perform access on devices or OS data structures
Wrappers for the Five System Calls Defined in C Standard library, #include <stdio.h> Func2on Prototype FILE *fopen(const char *path, const char *mode); size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream); size_t fwrite(const void *ptr, size_t size, size_t nmemb, FILE *stream); int fseek(file *stream, long offset, int whence); int fclose(file *fp); Descrip2on Opens the file named by path and associates a stream with it. Returns NULL if error. Reads nmemb elements of data, each size bytes long, from stream, storing them at the localon given by ptr. Writes nmemb elements of data, each size bytes long, to stream, obtaining them from the localon given by ptr. Sets file posilon for stream. The new posilon is offset bytes + posilon specified by whence, which can be one of SEEK_SET, SEEK_CUR, SEEK_END. Flushes the stream pointed to by fp, emptying the buffer, and closes the underlying file descriptor.
Mode r Descrip2on File Open Modes Open text file for reading. Stream is posiloned at the beginning of the file. r+ Same as above, except can write. w Truncate file to zero length or create text file for wrilng. Stream is posiloned at the beginning of the file. w+ Same as above, except can read. a Open for appending or create text file for wrilng. Stream is posiloned at the end of the file. a+ Same as above, except can read. Stream for reading is posiloned at the beginning of the file. Stream for wrilng is posiloned at the end of the file. Adding b to mode string changes file to binary mode, or raw mode - E.g. rb, w+b, wb+ - Disables conversions done in text mode for OS compalbility (E.g. A newline in Unix is \n but in Windows \r\n )
#include <stdio.h> int main() { FILE *rfile; int nread; char buf[100]; Binary File Dump Example >>./a.out #include <stdio.h> int main() char buf[100]; if((rfile=fopen( main.c","rb"))==null) return 1; { FILE *rfile; int nread; while(nread = fread(buf, sizeof(char), 100, if((rfile=fopen( main.c","rb"))==null) rfile)) { return 1; fwrite(buf, sizeof(char), nread, stdout); while(nread = fread(buf, sizeof(char), 100, rfile)) { fwrite(buf, sizeof(char), nread, stdout); fclose(rfile); return 0; fclose(rfile); return 0;
Feof and Ferror Func2on Prototype int feof(file *stream); int ferror(file *stream); void perror(const char *str); Descrip2on Returns non-zero if end-of-file Returns non-zero if error occurred Prints descriplve error message on the last error encountered, prefixed by str: Proper way to do fread(): if(fread(buf, sizeof(char), length, rfile)!= length) { if(feof(rfile)) prin`( End of file.\n ); else if(ferror(rfile)) perror( Error ); else { /* unreachable */ fread() reading less than requested bytes means either end-of-file or error Must use feof and ferror to determine which occurred
AddiLonal (Text) File I/O FuncLons Func2on Prototype int fgetc(file *stream); int fputc(int c, FILE *stream); char *fgets(char *s, int size, FILE *stream); int fputs(const char *s, FILE *stream); int fscanf(file *stream, const char *format,...); int fprin>(file *stream, const char *format,...); Descrip2on Reads the next character from stream and returns it. Returns EOF on end-of-file or error. Writes the character c to stream. Returns c on success or EOF on error. Reads in at most one less than size characters from stream and stores them into buffer s. Reading stops axer an EOF or a newline. A '\0' is stored axer the last character in the buffer. Returns s on success and NULL on error. Writes the string s to stream, without its trailing '\0'. Returns non-negalve number on success, EOF on error. Same as scanf, except reading from stream rather than stdin. Same as prin`, except wrilng to stream rather than stdout.
#include <stdio.h> int main() { FILE *rfile; char buf[100]; Text File Dump Example char buf[100]; if((rfile=fopen("main.c","r"))==null) return 1; while(fgets(buf, 100, rfile)) { fputs(buf, stdout); fclose(rfile); return 0; >>./a.out #include <stdio.h> int main() { FILE *rfile; if((rfile=fopen("main.c","r"))==null) return 1; while(fgets(buf, 100, rfile)) { fputs(buf, stdout); fclose(rfile); return 0;
Standard I/O There are three standard file descriptors for console I/O declared in <stdio.h> extern FILE *stdin; (for standard input) extern FILE *stdout; (for standard output) extern FILE *stderr; (for standard error) stdout is used for normal output; stderr is used to output error messages prin`( ) is equivalent to fprin`(stdout,...) Underlying file descriptors for stdin, stdout, stderr are 0, 1, 2 respeclvely
Standard I/O RedirecLon Unix shells allow standard I/O to be redirected to/from another file Operator command [N]< file command [N]> file Descrip2on Redirect input of file descriptor N to file while running command. N defaults to 0 (stdin). Redirect output of file descriptor N to file while running command. N defaults to 1 (stdout). If file does not exist, it is created. If it exists, it is truncated. command [N]>> file Redirect output of file descriptor N to file while running command. N defaults to 1 (stdout). If file does not exist, it is created. If it exists, output is appended to end of file. Examples: ls -l >./ls.out 2>./ls.err (redirect stdout to ls.out and stderr to ls.err) cat <./ls.out (redirect ls.out to stdin of cat)
File RedirecLon Viewed Through /proc ls -l /proc/self/fd: lists file descriptors for current process (ls) 0, 1, 2 are stdin, stdout, stderr and connected to pts 0 (console) 3 is /proc/16970/fd (a link to directory /proc/self/fd opened by ls) (84) thot $ ls -l /proc/self/fd total 0 lrwx------ 1 wahn UNKNOWN1 64 Sep 11 13:37 0 -> /dev/pts/0 lrwx------ 1 wahn UNKNOWN1 64 Sep 11 13:37 1 -> /dev/pts/0 lrwx------ 1 wahn UNKNOWN1 64 Sep 11 13:37 2 -> /dev/pts/0 lr-x------ 1 wahn UNKNOWN1 64 Sep 11 13:37 3 -> /proc/16970/fd ls -l /proc/self/fd > /tmp/ls.out (redireclng stdout to ls.out) Now 1 (stdout) is changed to /tmp/ls.out (85) thot $ ls -l /proc/self/fd > /tmp/ls.out && cat /tmp/ls.out total 0 lrwx------ 1 wahn UNKNOWN1 64 Sep 11 13:41 0 -> /dev/pts/0 l-wx------ 1 wahn UNKNOWN1 64 Sep 11 13:41 1 -> /tmp/ls.out lrwx------ 1 wahn UNKNOWN1 64 Sep 11 13:41 2 -> /dev/pts/0 lr-x------ 1 wahn UNKNOWN1 64 Sep 11 13:41 3 -> /proc/17025/fd
System Calls Beneath C Library Calls strace./a.out strace is a ullity that prints system calls made by a program a.out is the text file dumper using fgets and fputs run on a source file (116) thoth $ strace./a.out open( hello.c", O_RDONLY) = 3 read(3, "#include <stdio.h>\nint main()\n{\n"..., 4096) = 75 write(1, "#include <stdio.h>\n", 19) = 19 write(1, "int main()\n", 11) = 11 (117) thoth $ strace./a.out >./a.stdout open( hello.c", O_RDONLY) = 3 read(3, "#include <stdio.h>\nint main()\n{\n"..., 4096) = 75 write(1, "#include <stdio.h>\nint main()\n{\n"..., 75) = 75 NoLce the difference in buffering with and w/o stdout redireclon w/o: flushes write buffer on each line (because it is streamed to interaclve terminal) with: flushes buffer only on fclose (for efficiency because it is streamed to storage) UNIX shell calls setbuf() to change buffering policy before launching a.out
File Formats Programmer s responsibility to design file formats Text format (e.g. config files, log files, HTML, XML) + Human readable - Bloated representalon è Numbers are represented as strings - Impossible to do random access (must do sequenlal access) è Usually forma ed using delimiters (e.g. spaces, newlines) Binary format (e.g. image files, database files) + Compact representalon è Numbers are represented in binary + Easy to do random access using lseek è Fixed length records - Needs translalon for human comprehension