THE EVALUATION OF OPERANDS AND ITS PROBLEMS IN C++

Proceedings of the South Dakota Academy of Science, Vol. 85 (2006) 107 THE EVALUATION OF OPERANDS AND ITS PROBLEMS IN C++ Dan Day and Steve Shum Computer Science Department Augustana College Sioux Falls, SD 57197 INTRODUCTION In beginning C++ courses, the precedence of operators is a main focus. It is taught that each operator has certain precedence. It is also said that mathematical operators are math-like in operation while the non-math operators just fit in the mix somewhere. More or less, these courses focus on how the operators work on the values of operands. However, this focus is not the whole story. All implementations have some method of evaluating operands in order to retrieve the values that they store. What is important is that C++ does not provide a strict ordering of when the operands are evaluated, unlike operator precedence. Many assume that the evaluation of operands will coincide with operator precedence, but this is not guaranteed. It is important to know that C++ leaves all ordering of evaluating operands unspecified and a certain subset of those evaluations involving side effects undefined. EVALUATING OPERANDS C++ is not strict about how an implementation evaluates operands (and side effects, which will be considered later). It allows for a lot of freedom. Here is the relevant section in the C++ standard: Except where noted, the order of evaluation of operands of individual operators and subexpressions of individual expressions, and the order in which side effects take place, is unspecified. In general, the order of evaluating operands is unspecified. This means that there is no certain order an implementation must follow. For example, Java has a guarantee that all operands are evaluated in a left-to-right order. In C++, an implementation is free to evaluate operands in a left-to-right, right-to-left, or any other imagineable order. Moreover, an implementation does not have to be consistent with evaluating operands. It may order the operands of one expression different than the same, exact expressions that may appear later in the program. What does this all mean? Take this code for example:

108 Proceedings of the South Dakota Academy of Science, Vol. 85 (2006) int g() std::cout << g() << std::endl; return 0; int f() std::cout << f() << std::endl; return 0; return g() + f(); It is unknown at compile-time if the implementation will call function g or f first. Consequently, it is unknown whether which text will appear first on the screen. There are two possible ways an implementation could order the evaluation of operands, and both are equally available to the implementation. It must be noted that a definite output will occur. The program is well-formed, valid according to the C++ standard, so it must produce some type of output and not crash. This behavior must be contrasted with undefined behavior, which will be discussed shortly. SEQUENCE POINTS, SIDE EFFECTS, AND UNDEFINED BEHAVIOR Side Effects and Sequence Points As shown, all evaluation of operands is done in an unspecified way. However, there is a subset of evaluation that actually can produce undefined behavior. This situation arises when a programmer uses expressions that cause side effects to the operands. C++ defines side effects to be: Accessing an object designated by a volatile lvalue, modifying an object, calling a library I/O function, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment. Evaluation of an expression might produce side effects. In addition, C++ leaves the time of resolution of the side effect (being the actual point in the execution of the program where the side effect is actually applied) up to the implementation. All that C++ requires is that an implementation must resolve all side effects between two sequence points at some point no later than the next sequence point. A sequence point is:

Proceedings of the South Dakota Academy of Science, Vol. 85 (2006) 109 At certain specified points in the execution sequence called sequence points, all side effects of previous evaluations shall be complete and no side effects of subsequent evaluations shall have taken place. Sequence points are mostly found at semicolons, functions, and some operators. This is important because many operators do not have a sequence point; so many side effects can take place before an implementation must resolve the side effects. Sequence points are only found at these places: There is a sequence point at the completion of evaluation of each full-expression. When calling a function (whether or not the function is inline), there is a sequence point after the evaluation of all function arguments (if any) which takes place before execution of any expressions or statements in the function body. There is also a sequence point after the copying of a returned value and before the execution of any expressions outside the function. Several contexts in C++ cause evaluation of a function call, even though no corresponding function call syntax appears in the translation unit. The sequence points at function-entry and function-exit (as described above) are features of the function calls as evaluated, whatever the syntax of the expression that calls the function might be. In the evaluation of each of the expressions a && b a b a? b : c a, b using the built-in meaning of the operators in these expressions, there is a sequence point after the evaluation of the first expression. The importance of sequence points relates to what a programmer can and cannot do between sequence points. While C++ is very loose towards the implementation, C++ requires programmers to realize that there are certain rules of how many side effects a programmer can apply to a single operand. THE PROBLEM WITH SEQUENCE POINTS A problem arising sequence points is that an operand can have multiple side effects between any two sequence points. Consider this piece of code: int i = 0; i = 5 + i++; Within the second statement (since we are unconcerned with the declaration), there is only one sequence point, the semicolon, and two side effects, the assignment and increment operators. The problem is what is i after the semico-

110 Proceedings of the South Dakota Academy of Science, Vol. 85 (2006) lon: 5 or 1? Since an implementation can order side effects freely, a programmer would think either is possible. However, the above code actually produces undefined behavior. Undefined behavior simply means that a program is no longer well-formed and may produce any result, including crashing during execution. The fact that the code attempts to put multiple side effects on i between two sequence points is the cause of undefined behavior. Why is the above program undefined? It is because C++ only allows so many side effects on any one operand between sequence points. Here is the relevant section: Between the previous and next sequence point a scalar object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be accessed only to determine the value to be stored. The requirements of this paragraph shall be met for each allowable ordering of the subexpressions of a full expression; otherwise the behavior is undefined. A programmer must be careful about this rule. An operand can only be modified once between sequence points, although its value can be accessed multiple times. It is important to realize the seriousness of this rule. Breaking in this rule can easily create a program that is ill-formed making it not portable and potentially unusable. EVALUATION, SIDE EFFECTS, AND OPERATOR OVERLOADING The rules of evaluating operands are mostly inherited from C. Mostly when the topic of sequence points and undefined behavior is brought up; it is talked about in terms of using built-in types, such as ints, chars, and pointers. However, C++ adds operator overloading, so it is worthy to wonder how sequence points related to user-defined objects and operator overloading. Consider this example: class X public: X operator++(int); //Implementation omitted X& operator=(const X& other); ; X x; x = x++;

Proceedings of the South Dakota Academy of Science, Vol. 85 (2006) 111 With a built-in type, this would clearly be undefined behavior. However, the type is user-defined. It is necessary to consider the nature of operator overloading. The fact that programmers use the operators directly is a matter of convenience, but the compiler must translate the code to use the functions defined in the class in order for the program to work. Consider if main was re-written as follows: X x; x.operator=(x.operator++(0)); For this article, operator overloading is nothing more than just calling member functions of a class. A compiler would translate the original code to one that uses functions. The presence of functions adds several sequence points, one before the calling of each function and one after each function returns. The existence of many sequence points does not cause undefined behavior like if a built-in type would have been used instead. SUMMARY While the order of evaluation of operands is a very technical subject in the C++ language, it is nevertheless an important one. Its subtle rules can lead to unexpected behavior in a program. It is necessary for a programmer to remember that it is incorrect to assume any specific ordering of evaluating operands or to assume any specific resolution of side effects. Assuming anything outside of the very loose rules C++ provides can lead to potential problems later during the development and porting of a software application. REFERENCES INTERNATIONAL STANDARD, ISO/IEC14882. Programming languages C++. 10/15/2003