1 Socket programming in C Sven Gestegård Robertz <> September 2017 Abstract A socket is an endpoint of a communication channel or connection, and can be either local or over the network. We will now give a brief introduction to socket programming, particularly IPv4 network sockets using the C standard library. A socket can be either a server socket, accepting connections, or a client socket, connecting to a server socket. The C library functions for creating a socket are presented in Section 1. For sending and receiving data, a socket is abstracted as a file, so the same functions that are used for reading and writing normal files are used to send and receive data over a socket. The difference is how to open and set up a socket. The C library functions for reading and writing files are presented in Section 2. Section 3 shows how to open actual files, and Section 4 gives examples of how error handling can be expressed. This is a brief introduction to the topics. For further information, the reader is referred to the manual pages of the presented functions (see Section 5). 1 Sockets A socket is created using the function socket(). It can then be made a server socket with calls to bind (binding a name to the socket), listen() (make the socket a passive socket, listening for incoming connections) and accept() (which blocks until a client connects). To make it a client socket, the function connect() is used. The prototypes of the socket functions are int socket ( int domain, int type, int protocol ); where domain is the kind of socket (AF_INET for IPv4), type is the type of connection (SOCK_STREAM for a connection-based byte stream (i.e., TCP) and SOCK_DGRAM for datagram-based channel (i.e. UDP)). protocol specifices the protocol to be used with the socket. Often only one protocol exists, and 0 is used. The return value is a positive file descriptor for the created socket, or negative on error. int bind ( int fd, const struct sockaddr * addr, socklen_ t len ); where fd is the file descriptor returned by socket, addr is a pointer to a structure containing the address and len is the size (in bytes) of that structure. The actual structure type used depends on the kind of socket, so the struct sockaddr can be viewed as the superclass of the actual struct being used, but as C is not object oriented, the programmer has to do typecasts and pass the size of the actual structure as a parameter. For IPv4 connections, the actual type to use is struct sockaddr_in. 1

2 int listen ( int fd, int backlog ); where fd is the socket to use, and backlog is the size of the queue for pending connections. int accept ( int fd, struct sockaddr * addr, socklen_ t * addrlen ); where fd is the socket to use. addr and addrlen are in/out parameters where the address of the peer socket is filled in. Memory for the struct has to be allocated by the caller, and the size of the allocated memory passed. On return, addrlen contains the actual size of the address structure, and if this is larger than what was passed, the address is truncated. If NULL is passed, nothing is filled in. int connect ( int fd, const struct sockaddr * addr, socklen_ t len ); where fd is the socket, addr is the address to connect to, and len is the size of the address struct. 1.1 Closing a file descriptor When a program no longer needs to use a file (or other resource abstracted as a file, like a socket), or before terminating the program, the file must be closed, in order to release any operating system resources, locks, etc.. If a socket file descriptor is not properly closed before exiting, the next time a program tries to open that socket it may get an error that the port is in use. A file is closed with a call to int close(int fd); where fd is the file descriptor, and the return value is zero on success. On error, 1 is returned and errno is set to indicate the error. Failing to close a file descriptor may give errors that the file is in use on subsequent attempts to use the file or resource. Note that with TCP sockets, a call to close() does not direcly close the socket. As TCP is connection based, closing the socket involves synchronizing with the remote end, and it is common that this fails, rendering the socket (address/port pair) unusable until there is a timeout in the TCP implementation. To force reuse of a port number, the socket option SO_REUSEADDR can be set, using the function int setsockopt ( int fd, int level, int optname, const void * optval, socklen_ t optlen ); as follows: int val = 1; if( setsockopt (fd, SOL_SOCKET, SO_REUSEADDR, &val, sizeof ( val ))) { perror (" setsockopt "); return -1; Note that the option value is passed as a pointer, and that it is strongly recommended to always check the return value for errors. 2

3 2 File I/O For terminal IO, sockets, and many other forms of I/O, the operating system exposes the communication channel as a file, and accessing it involves two important concepts, file descriptors are integers representing (opened) files as operating system resources, and are used for opening, closing, low-level configuration and access to the files or devices. streams are high level channels that allows sending/receiving a stream of characters. Streams provide a richer interface with formatted I/O, buffering, etc., and also abstracts the underlying resources and provides a unified interface. In POSIX, a stream is represented as a FILE *. This section describes both ways to read and write a file, but as a general hint it is recommended to use structured functions for doing output (i.e., the different fprintf() variants), but for simple tasks, it is often less error prone to use the low-level function for input (i.e, read()). One reason for that, in addition to buffering and blocking, is that the structured input functions retain the end-of-line character(s) 1 and add a terminating null to the string, whereas read() just gives the actual characters received. 2.1 Raw file access using file descriptors The low-level functions for reading and writing a file are read() and write(), which read(write) at most nbyte bytes from(to) the resource behind file descriptor fd. The prototypes are: ssize_t read ( int fd, void * buf, size_t nbyte ); ssize_t write ( int fd, const void * buf, size_t nbyte ); Please note that neither the C compiler, nor the runtime system, does any range checks, so it s up to the programmer to ensure that the size of buf is at least nbyte, or memory used by other variables may get overwritten with undefined consequences. We let examples from the manual pages illustrate the usage. The first example reads data from the file associated with the file descriptor fd into the buffer pointed to by buf. # include <sys / types.h> # include < unistd.h> char buf [20]; size_t nbytes ; ssize_t bytes_ read ; int fd; nbytes = sizeof ( buf ); bytes_ read = read ( fd, buf, nbytes ); 1 Depending on system configuration, the end-of-line (EOL) marker may be either a single newline character (\n) or a newline and a carriage return (\n\r). If you don t understand this, don t worry, just use read() and write your own input parsing. 3

4 The second example writes data from the buffer pointed to by buf to the file associated with the file descriptor fd. # include <sys / types.h> # include < string.h> char buf [20]; size_t nbytes ; ssize_t bytes_ written ; int fd; strncpy (buf, " This is a test \n", sizeof ( buf )); nbytes = strnlen ( buf ); bytes_ written = write ( fd, buf, nbytes ); Note the operator sizeof() for getting the size (in bytes) of a data type or statically known 2 array, and the convenient functions strncpy() and strnlen() (from string.h) for copying and determining the length (the lenght of the string itself, not the size of the array containing it) of a null-terminated string. To make the code portable, the integer types size_t (from sys/types.h) for unsigned sizes and ssize_t (from unistd.h) for signed sizes (used since read() and write() can return a negative value on error) are used. 2.2 Structured stream I/O Above the low-level read() and write() functions, the standard library stdio.h provides powerful functions (e.g., fprintf() and fscanf())for formatted I/O, including formatting integers numbers as decimal or hexadecimal strings, real numbers, etc. It also includes the function getc() which reads the next (unsigned) character from a stream and casts it to an int, and the function fgets() which reads a string from the stream until a newline or end of file (EOF) is encountered and stores it in a buffer Function overview The function for creating a stream from a file descriptor is FILE * fdopen ( int fd, const char * mode ); where fd is the file descriptor and mode is a string specifying the mode of access. In this course we will typically use "r+", meaning reading and writing. There is also a function int fclose ( FILE *fp ); for flushing and closing the stream fp and then closing the underlying file descriptor. Upon successful completion 0 is returned. Otherwise, EOF is returned and errno is set to indicate the error. 2 That means that is works for a variable defined with a statically known size (like char buf[20]) but not objects refereced by pointers or dynamically allocated buffers. For this course, it is recommended to stick to the former method of allocating buffers. 4

5 Then we have the functions for reading one character and a line, respectively, from stream; int fgetc ( FILE * stream ) char * fgets ( char *s, int size, FILE * stream ) fgetc() reads the next character from stream and returns it as an unsigned char cast to an int, or EOF on end of file or error. fgets() reads at most one less than size characters from stream and stores them into the buffer pointed to by s. Reading stops after an EOF or a newline. If a newline is read, it is stored into the buffer. A terminating null byte (\0) is stored after the last character in the buffer. fgets returns s on success, and NULL on error or when end of file occurs while no characters have been read. EOF is a constant (or macro) defined in stdio.h, typically 1. Finally, we have the functions for formatted output(input) to(from) streams, int fprintf ( FILE * stream, const char * format, ); int fscanf ( FILE * stream, const char * format, ); where stream is the stream to access, format is a string specifying the format of the data, and the remaining arguments () depend on the format string Formatted output example Before going through the format specification in detail, we will first give some examples of conversion using printf() (which writes to standard out, the stream connected to the terminal the program is executed in. I.e., printf() is a shorthand for fprintf(stdout, )). 5

6 The program # include <stdio.h> int main () { int x =17; int y =428; int z =1; double f = 13.37; double g = ; char *s1 = " hej "; char *s2 = " hopp "; char b = 0 b ; unsigned char ub = 0 b ; printf ("x == %d\n", x); printf ("%x, %#X, %.4X, %d %u\n",x,x,x,x,x); printf ("%x, %#X, %.4X, %d %u\n",y,y,y,y,y); printf ("%#10 x, %10.2x, %s\n",x,x, s1 ); printf ("%#10 x, %10.2x, %s\n",y,y, s2 ); printf ("%#10 x, %10.2x, %s\n",z,z, " literal "); printf ("%f, %.2f, %2.2 f\n",f,f,f); printf ("%f, %.9f, %2.2 f\n",g,g,g); // note the difference between signed and unsigned. printf ("%x, %d, %u\n", b,b,b); printf ("%x, %d, %u\n", ub,ub,ub ); printf ("%x, %d, %u\n", ( unsigned char ) b,( unsigned char ) b, ( unsigned char ) b); return 0; produces the output x == 17 11, 0X11, 0011, ac, 0X1AC, 01AC, x11, 11, hej 0x1ac, 1ac, hopp 0x1, 01, literal , 13.37, , , ffffffa5, -91, a5, 165, 165 a5, 165, 165 Note the difference between signed and unsigned integer types as indicated in the code, and how sign extension changes the value of a variable that is assigned a constant with the most significant bit set. If the reader doesn t understand the details of that it is fine (but do make sure to use unsigned types when dealing with bit patterns etc.). However, we encourage the interested student to work out the details as it gives valuable insight into two s complement arithmetics. That is quite valuable to the Java programmer, as Java unfortunately does not have unsigned data types and therefore requires great care when doing bit operations. 6

7 2.2.3 The format specification The format string is made up of ordinary characters, which are copied as is, and conversion specifications, which causes conversion of the corresponding argument (defined by the order). A conversion specification begins with a % and ends with a characher, where the most common ones are character d,i x,x u c f s argument type, convert to/from signed decimal notation unsigned hexadecimal notation unsigned decimal notation single character (converted to unsigned char) double in decimal notation [-]mmmm.dddd char* (null terminated string) The conversion specifications also take optional arguments, placed between the % and the conversion character. They are, in order A minus sign, making the converted argument left adjusted. #, specifying an alternate form. For x,x it means printing the 0x prefix, for f it means always printing the decimal point even if no digits follow it. For others, refer to the manual page. A number specifying the minimum field width. A period, separating the width from the precision A number specifying the precision. The meaning depends on the conversion type; for floating point values it sets the number of decimals, for integer the minimum number of digits, and for strings the maximum number of characters to be printed. An h if an interger is to be printed as a short or an l if as a long. Please note that fprintf() and fscanf() uses the format string to decide the number of arguments that follow and their types. If they don t match the result is undefined. There are also similar functions for doing the same formatting operations on a string rather than a stream, that may be quite convenient: int sprintf ( char *str, const char * format, ); int snprintf ( char * str, size_t size, const char * format, ); int sscanf ( FILE * stream, const char * format, ); The difference between sprintf() and snprintf() is that the latter takes the size of the destination, str, as an argument and writes at most size bytes (including the terminating null byte ( \0 ))to str. As C has no range checking, in order to avoid buffer overflows and hard to find errors sprintf() should be avoided in favour of snprintf(). 7

8 3 Opening a file We have seen how the file abstraction is used for sockets. To access an actual file, it must first be opened by calling a function open, with the prototype int open(const char *path, int oflag); where path is a string with the path to the file, and oflag is an integer whose bits represent different modes of opening, creation and access to the file. The return value is the file descriptor or -1 if an error occurred. Values for oflag are constructed by a bitwise-inclusive OR of flags defined in i header file (<fcntl.h>). Applications shall specify exactly one of these three values (file access modes) O_RDONLY Open for reading only. O_WRONLY Open for writing only. O_RDWR Open for reading and writing. In addition to that, more flags can be set. Most of them define how files shall be created, if existing files shall be overwritten or appended to, etc.. As examples of the kind of behaviour that is controlled by flags we have O_DIRECTORY which causes open to fail if path is not a directory. O_NOFOLLOW which causes open to fail if path is a symbolic link. Please see the manual pages for further information. 4 Error handling examples The following snippet shows how a file is opened and closed, including error handling. const char * filename = "/ path /to/ file "; int fd; fd = open ( filename, O_ RDONLY ); if ( fd <0) { // an error occurred. Print message and exit program perror (" Failed to create socket "); exit ( -1); // if( close (fd) <0 ) { perror (" Failed to close socket "); If the return value of open or close is negative, an error has occurred. perror() is a function printing a message on the standard error output, describing the last error encountered during a call to a system or library function 3. 3 The mechanism behind this is that when an error occurs, the failing function writes an error code to the global variable errno, and perror() reads that variable and prints a humanreadable message corresponding to the error code. The error codes are defined in errno.h. 8

9 Another common pattern for handling errors during setup is the use of goto. As C doesn t have exceptions, if an error occurs during initialization, goto is used to jump out to the corresponding cleanup section of a function. Thus, something that could be written using nested try blocks in java like void example ( int port ) throws SomeException { try { ServerState server = createsocket (); try { server. bindandlisten ( port ); server. run (); finally { server. closesocket (); catch { java. net. SocketException e) { throw new SomeException ( e); is often written using goto in C: int example ( int port ) { struct server_ state server ; int result =0; init_ server (& server ); if( create_socket (& server )) { result = FAILED_ TO_ CREATE_ SOCKET ; goto no_ server_ socket ; if( bind_and_listen (& server, port )) { result = FAILED_ TO_ LISTEN ; goto failed_ to_ listen ; run_server (& server ); failed_ to_ listen : if( close ( state.fd )) perror (" closing fd "); no_ server_ socket : return result ; where the functions create_socket() and bind_and_listen(), and the error codes FAILED_TO_CREATE_SOCKET and FAILED_TO_LISTEN are defined elsewhere in the application. Please note that in C, goto can only be used to jump within a function, unlike Java exceptions. To communicate errors across function calls, the return value is normally used. 9

10 5 Manual pages On a unix-like system (including GNU/Linux and MacOS X), the documentation of system calls and the functions in the C library is accessible through the man system. For instance, to read the documentation for the socket function, give the command > man socket which will open a pager with the manual page: SOCKET( 2 ) Linux Programmer s Manual NAME s o c k e t c r e a t e an endpoint f o r communication SYNOPSIS #i n c l u d e <s y s / t y p e s. h> / See NOTES / #i n c l u d e <s y s / s o c k e t. h> i n t s o c k e t ( i n t domain, i n t type, i n t p r o t o c o l ) ; DESCRIPTION s o c k e t ( ) c r e a t e s an endpoint f o r communication and r e t u r n s a d e s c r i p t o r. Important sections of a manual include RETURN VALUE and ERRORS. RETURN VALUE On s u c c e s s, a f i l e d e s c r i p t o r f o r the new s o c k e t i s r e t u r n e d. On e r r o r, 1 i s returned, and e r r n o i s s e t a p p r o p r i a t e l y. At the end of the page is the section SEE ALSO, which contains pointers to related man pages: SEE ALSO a c c e p t ( 2 ), bind ( 2 ), connect ( 2 ), f c n t l ( 2 ), getpeername ( 2 ), getsockname ( 2 ), g e t s o c k o p t ( 2 ), i o c t l ( 2 ), l i s t e n ( 2 ), read ( 2 ), r e c v ( 2 ), s e l e c t ( 2 ), send ( 2 ), shutdown ( 2 ), s o c k e t p a i r ( 2 ), w r i t e ( 2 ), g e t p r o t o e n t ( 3 ), i p ( 7 ), s o c k e t ( 7 ), tcp ( 7 ), udp ( 7 ), unix ( 7 ) Note the names, e.g., socket(2), where (2) means that the page is in section 2 of the manual. For some names, (e.g., write) there is a command in section 1 of the manual with the same name, and the command man write will show the first match. The section is specified before the name to look up (or, on some systems, with the option -s <section>. To get the man page for the function write(2), use the command > man 2 write or > man -s 2 write In the manual, section 2 contains system calls, and section 3 contains library functions (like, for instance, printf). If you don t know what section the manual page is in, the option -a gives all man pages with the given name. For more details, see man(1). Manual pages that give an overview of a topic can be found in section 7 (miscellaneous), for instance socket(7) which documents the socket interface, and ip(7) which describes the IPv4 implementation (i.e. TCP and UDP sockets). 10

