ADVANCED I/O ISA 563: Fundamentals of Systems Programming
Agenda File Locking File locking exercise Unix Domain Sockets Team Projects Time
File Locking Background Both high-performance and general-purpose database systems require atomic access to database resources & records Unix systems have evolved to support this need flock(2) early locking primitive, only locks files fcntl(2) raw, powerful, configurable interface lockf(3) simplified API built on fcntl(2) Key ability is to lock sections, or records, in a file
Recall: Properties of Unix Files Unix files contain no markup A sequence of raw data bytes Contrast this with Windows files, which contain markup and index information File systems maintain maps to data blocks
Advisory vs. Mandatory Locking Advisory locking is for cooperating processes: a group of processes that have a gentleman s agreement to only use a particular API to access data Mandatory locking puts the burden on the kernel (really, the file system code) to check access in the particular open, read, and write system calls to intercept any process trying to access a file
Aside: Security Considerations Note the trust relationships here! TCB includes the kernel Note how mandatory locking is achieved via a complete abuse/hack on the file permission bits Set-group-ID ON, group-execute OFF The semantics of this combination are entirely unclear Similar to playing with unusual combinations of memory page permissions Mandatory locking can be used maliciously to prevent legitimate access to a file
Early Forms of File Locking: flock(2) flock(2) is a system call that allows cooperating processes to lock a whole file. It uses advisory locks. #include <sys/file.h> int flock(int fd, int operation);
flock(2) operations Symbol Value Meaning LOCK_SH 1 Shared lock LOCK_EX 2 Exclusive lock LOCK_NB 4 Don t block when locking LOCK_UN 8 Unlock Note how their values follow the flag pattern we discussed last class: they have significant bits at non-overlapping locations in their binary representation.
flock(2) Caveats flock(2) is useful but coarse-grained Advisory locking: only cooperating processes actually using the flock(2) interface obey the locking restrictions flock(2) locks whole files, not regions of a file Also, the lock is on the file, not a file descriptor. Child processes (via fork(2)) and new file descriptors (via dup(2)) do not result in a new lock. Children can thus unlock the file and cause the parent to lose the lock.
Raw Record Locking: fcntl(2) The fcntl(2) system call is a general interface for controlling file locking and locking portions or regions of a file More powerful than flock(2), and hence somewhat more complex The flock structure holds meta-data about the lock
The flock structure This structure describes: The type of lock Shared read: F_RDLCK Exclusive write: F_WRLCK Unlock: F_UNLCK The size of the region to lock (in bytes) The offset of where to begin locking (a combination of two parameters, l_start, which is relative to l_whence) A process ID (of a process that *may* already hold a lock on this file)
Using fcntl(2) #include <fcntl.h> int fcntl(int filedes, int cmd, &flock_structure); Third argument is a pointer to a flock struct. cmd is one of: F_GETLK, F_SETLK, F_SETLKW filedes must be open for reading or writing (appropriate to the desired type of lock)
The fcntl cmd parameter Command F_GETLK F_SETLK F_SETLKW Determine if the flock structure describes a lock held by some other process. The pid argument in the flock structure will be filled in with that process ID. If no lock exists, the flock structure remains unchanged except that l_type is set to F_UNLCK Set (via F_RDLCK or F_WRLCK) or unset the lock (via F_UNLCK) The blocking version of F_GETLK (W means wait ). Testing with F_GETLK and then trying to grab the lock with F_SETLK is not an atomic operation, thus two processes can race to grab the lock.
Simplifying Life with lockf(3) This function is a simplified API that uses fcntl(2) underneath in its implementation Usually used in conjunction with lseek(2) or fseek(3) Because it has no parameter to say where to lock from in a file, just the size of the region to lock
The lockf(3) interface #include <unistd.h> int lockf(int filedes, int function, off_t size); filedes must be open, either O_WRONLY or O_RDWR as appropriate for desired type of lock size is the size (in bytes) of the region to lock. A value of zero means lock through the largest possible size of the file function is described on next slide
The lockf(3) function parameter Function F_ULOCK F_LOCK F_TLOCK F_TEST Description Unlock locked section Lock a section for exclusive use Test and lock a section for exclusive use Test a section for locks by other processes Note that there is no distinction between read and write locks like with fcntl(2). Note also the atomic operation F_TLOCK. If fcntl(2) is not atomic, then how might we get this operation to be atomic? Food for thought
File Locking Experiment Using flock(2), write two processes that share information via a file called myinfo. One process accepts user input and writes it to the file. The other process should attempt to lock this file and read data from it. Does this process always block? Create a third process that writes to the file without obtaining a lock via flock(2). Observe results.
Unix Domain Sockets A form of fast IPC, using standard Unix names An alternative to using Internet sockets UDS datagrams (unlike Internet UDP) are reliable
Advantages of Unix Domain Sockets Can be referred to via a filename This is the standard Unix way of naming things, contrast with other forms of IPC that require a new, complex namespace Can use standard file tools (e.g., ls, rm) with them They are fast: they only copy data They do not involve: protocol state headers to add, remove, or checksum sequence numbers acknowledgements to send, no keepalives
More Advantages of UDS Both stream and datagram interfaces (like TCP and UDP) Datagram service is reliable: No lost messages Messages are delivered in order Can use network-based API or the socketpair() function
The socketpair(2) system call Set up a pair of unnamed UNIX domain sockets Endpoints will be connected But remain nameless After socketpair() returns, the only way to refer to the endpoints is via the 3 rd argument, an array of two socket descriptors Can use the Unix Internet sockets API to bind an address (i.e., a pathname) to a UDS socket descriptor
The socketpair(2) signature #include <sys/socket.h> int socketpair(int domain, int type, int protocol, int sockfd[2]); domain should be AF_UNIX type is SOCK_STREAM or SOCK_DGRAM protocol is optional, use zero sockfd stores the two socket descriptor handles
Alternative: Using socket(2)' Must use the struct sockaddr_un structure to load in the desired name, then use the socket(2) system call to create the socket The sun_path member of this structure is a statically sized character array that can hold a file name. Example on next slide (no error checking, condensed from example on page 596 in APUE)
#include <sys/socket.h> #include <sys/un.h> void foo(){ int sd, size; struct sockaddr_un un; un.sun_family = AF_UNIX; strncpy(un.sun_path, somesocket.sock,16); sd = socket(af_unix, SOCK_DGRAM, 0); bind(sd, (struct sockaddr*)&un, size); }
Team Projects Time Use the remaining time to meet with your team and discuss / work on projects.