Systems Programming. COSC Software Tools. Systems Programming. High-Level vs. Low-Level. High-Level vs. Low-Level.

Systems Programming COSC 2031 - Software Tools Systems Programming (K+R Ch. 7, G+A Ch. 12) The interfaces we use to work with the operating system In this case: Unix Programming at a lower-level Systems Programming High-Level vs. Low-Level The interfaces here are Unix-specific (i.e. they may not exist on other platforms) They are also low-level interfaces In many cases, other functions you use (e.g. printf) are built on top of these functions Your program C Library Unix interface read_dict("words") f("words","r") ("words",o_rdonly) kernel actual work is done here High-Level vs. Low-Level Higher-level interfaces tend to be more abstract/generalized simpler to program but less powerful more standard (e.g. Standard C Library) Lower-level interfaces tend to be more specific harder to work with but more powerful potentially more efficient System Calls System calls are functions under Unix which are implemented in the kernel See section 2 of the manual pages e.g. chdir(2) - changes the current working directory (like "cd" command) your program calls this function in C, but the kernel does the real work 1

File Descriptors At the Unix level files are represented by integers called file descriptors For what we will talk about next we need: #include <unistd.h> #include <fcntl.h> read There are two main system calls for input and output int read(int fd, void *buf, int n) Reads n bytes from the file descriptor fd and stores the bytes at the address pointed to by buf Returns number of bytes read or <0 on an error write The write call is similar int write(int fd, void *buf, int n) Same as read() except that n bytes starting at address buf are written to file descriptor fd Opens a file for input and/or output int (char *path, int flags, int mode) path is obvious - it's the name of the file you want to work with flags is a set of bits (remember the C bitwise operators?) that describes how we will work with the file flags is a bitwise-or of several predefined constants flags O_RDONLY - file in read-only mode O_WRONLY - file in write-only mode O_RDWR - file for reading and writing O_CREAT - create file if it does not exist O_APPEND - all writes are made to the end of the file O_EXCL - (only with O_CREAT) - fails if file already exists 2

mode is the set of default permissions for the file as an integer Again these permissions are bits POSIX defines constants (<sys/stat.h>) S_IRUSR - read-permission (owner) S_IWGRP - write-permission (group) S_IXOTH - execute-permission (other) modes The values of the modes are so common that we often use the absolute values: 0100 - user exec 0010 - group exec 0200 - user write 0020 - group write 0400 - user read 0040 - group read 0001 - other exec 0002 - other write 0004 - other read So: ("foobar",o_creat O_WRONLY, 0644); would attempt to "foobar" for writing. If "foobar" didn't exist it would try to create the file with permissions "0644" Note that if we are not using O_CREAT then we do not need to give the "mode" argument: ("foobar",o_rdonly); is okay close When we are done with a file descriptor, we close it: int close(int fd); vs. f? calling () always switches control to the operating system [system call] making a system call has overhead (i.e. time required for making any system call) f() may buffer data in memory Try the example /cs/course/2031/.c 3

vs. f? f() actually uses () to actually a file With f() you also have fgetc(), fgets() With () you only have read(), write() However () is more flexible (O_CREAT, O_EXCL, specifying mode) We saw how to do temporary files in the shell, but how do we do them in C? See standard functions tmpfile(3c) and tmpnam(3c) Let's see how they would work Recall from our shell discussion gettemp() { id=0 while [ -f $1.$$.$id ]; do id=`expr $id + 1` done echo $1.$$.$id } Here we test to see if the file exists We can do something similar with First our variables: int gettemp(char *base) { char tmp[256]; int pid = (int)getpid(); int cnt = 0, fd; do { sprintf(tmp,"%s.%d.%d",base, pid,cnt++); fd = (tmp, O_RDWR O_CREAT O_TRUNC, 0600); } while (fd < 0); return fd; Note that we do not return a file name but a file descriptor We can then work with that file descriptor with read(2) and write(2) But what if we want to write some data to the temporary file and then read it back? 4

lseek The lseek(2) call allows us to set the position of a file descriptor in the file int lseek(int fd, int offset, int whence); whence is one of SEEK_SET - offset is from the start of the file SEEK_CUR - offset is from the current position SEEK_END - offset is from the end of the file lseek Example: lseek(5, 0, SEEK_SET); sets the position of file descriptor 5 to an offset of 0 from the beginning of the file (SEEK_SET). In other words, the next byte read from the file will come from the beginning of the file Temporary Files + lseek int fd; fd = gettemp("/tmp/work"); write(fd,data,10000); lseek(fd,0,seek_set); /* now we can read the data back */ read(fd,newdata,10000); vs. f again There are similar functions for f()-type files (see rewind(3c) and fseek(3c)) What if we really want to process a file descriptor's data one byte at a time? We can use successive calls to read() or fd FILE *fd(int fd, char *mode); Like f() but takes a file descriptor instead of a file name You can then use getc(), putc(), etc. on the returned FILE object File Descriptors vs. Files A file or a stream is a sequence of bytes Think of a file descriptor as a one point-ofview of a file You can the same file twice (different file descriptors) Different file descriptors can be at different positions in a file (even if they refer to the same file) 5

File Descriptors vs. Files An example: int fd1, fd2; fd1 = ("foobar",o_rdonly); fd2 = ("foobar",o_rdonly); lseek(fd1,1000,seek_set); This is valid and creates two file descriptors foobar: So what happens? 0 1 2 999 1000 1001 fd2 read(fd1,buf,5); fd1 0 1 2 999 1000 1001 fd2 fd1 Note that fd2's position is unaffected by fd1 1005 fd1 dup dup2 We can make a copy of a file descriptor int dup(int fd); Returns a new file descriptor which is a copy of the file descriptor fd (points to the same thing at the same location) A more useful variation on this is called dup2: int dup2(int fd, int nfd); Tries to create a copy of file descriptor fd with the new file descriptor id nfd. (Like dup() except you get to choose what the new file descriptor number is.) Back to the Bourne shell Remember this? command 2>file Executes command and redirects stderr to the file named file. The "2>" syntax means "redirect file descriptor 2" Standard File Descriptors The following file descriptor values are considered standard: 0 = stdin 1 = stdout 2 = stderr So write(1,buf,n); writes n bytes from address buf to stdout 6

To Think About File descriptors can refer to more than just files. They can also refer to pipes between programs, network connections, etc. [More on this next day.] Play around with the "truss" program in the Prism lab (traces system calls). For example, try: truss echo hi Try it on other commands too. 7