Programming in C++ 6. Floating point data types

Programming in C++ 6. Floating point data types! Introduction! Type double! Type float! Changing types! Type promotion & conversion! Casts! Initialization! Assignment operators! Summary 1

Introduction Floating point numbers are essential to almost every scientific calculation. In C++ there are three floating point types: float double long double The type float is typically represented by 4 bytes (the minimum acceptable for ANSI C standard). A double is usually represented by 8 bytes. A long double is usually represented by 10 or 12 bytes. However all that is required is that a double uses at least the same number of bytes as float and a long double uses as many as double. Double from many points of view is the standard floating point type. 2

Type double Identifiers of type double are defined by using the double keyword. double c; c = 2.997925e8; The expression for c is typical of floating point number and may be split into three parts: 2 the integer part 997925 the fractional part 8 the exponent A decimal point separates the integer and fractional parts while e or E separates the fractional part from the exponent. A floating point constant must contain a decimal point an exponent or both, since otherwise they would be nothing to distinguish it from integer. 3

Type double If there is a decimal point then either an integer or a fractional part must be present. If there is no decimal point then must be both an integer part and an exponent (but there can be no fractional part of course). 1.0.e9 // no integer or fractional part 0.1 1,000.0 // an embedded comma is not permitted 1e10 1 000.0 // an embedded space is not permitted 1.1e10 1000 // integer constant 11e9 e10 // no integer constant.11e11 All of the operators introduced for integers except for the modulus are valid for floating point identifiers and constants. For majority of compilers a double of eight bytes gives a range of about 10-308 -10 308 and an accuracy of sixteen decimal places. There is no operator for raising a number to a power. 4

Type float The type float should use at least four bytes but no more than for double to represent a floating point number. This gives the range of about 10-38 -10 38 and an accuracy of seven decimal places. The type float has two advantages over double: - it uses less memory, - is usually significantly faster. In C++ there is a tendency to carry out calculations in double and then convert back to float. Variables of type float are defined by the keyword float. float pi; pi = 3.141529f; A float constant is distinguished from its double counterpart by means of the suffix f or F, which can be omitted! 5

Changing types The type long double is defined by means of the keyword. long double gamma; gamma = 0.5772115664901532860606512L; A long double constant is distinguished by means of suffix l or L. Changing types The concept of type is needed in order to be able to distinguish the different uses we make of different bytes of memory. However it is from time to time necessary to convert a value having one type directly into another. This may either be done automatically or we may force the change. Since typing to some extent exists in order to protect us from our own foolishness we had better understand clearly what we are doing. 6

Type promotions & conversions In many situations arithmetic binary operators have operands of different types: double x; float y; int i; long j; x = 1 + 3.141529; // + has int and double operands y = 3.141529; // = has float and double operands i = 2; j = i; // = has long and int operands Such statements can often be avoided by careful programming, but they are valid in C++ since automatic conversions take place for arithmetic expressions containing mixed types. The rules for conversion are such that binary operations involving mixed types are performed using the type best able to handle the operands. There is a hierarchy of conversions with the first match in the hierarchy being the one that is actually used. 7

Type promotions & conversions A floating point conversion is performed (X stands for any type) long double and X -> long double double and X -> double float and X -> float Integral promotions (widenings) are performed on both operands. char -> int or unsigned int unsigned char -> int or unsigned int short -> int or unsigned int unsigned short -> int or unsigned int An integral conversion is performed as: unsigned long and X -> unsigned long long and unsigned -> long or unsigned long long and X -> long unsigned int and X -> unsigned int Denotions may also occur. 8

Casts An explicit conversion known as a cast can be performed by specifying the type as in the following example: double x, y; int i; i = 4; x = double(i); y = (double)i; // cast to double, preferred syntax // cast to double, older C style Used in this way double() or (double) is actually a unary operator. 9

Initialization It is possible to combine a definition and assignment in a single statement known as initialization as in: int i = 1, j = 2, k = 3; Such initializations are also possible for other fundamental data types. Identifiers only need to be defined before they are used. There is no requirement to have all of the definitions at the start of the program. Readability is improved by collecting definitions in one place where possible. Leaving the definition of an identifier until it can be initialized avoids the common error of using uninitialized variables in expressions. 10

Assignment operators It is often necessary to carry out pairs of operations: i += 5; // equivalent to i = i + 5 j += i; // equivalent to j = j + i x *= y; // equivalent to x = x * y z /= x; // equivalent to z = z / x x -= y; // equivalent to x = x - y i %= j; // equivalent to i = i % j Such assignment operators are a convenient shorthand and may help compiler to produce better code. It is also possible to carry out multiple assignments in one statement. i = j = k = 7; An lvalue is an expression referring to a named region of storage. The analogous term rvalue is less widely used and refers to an expression on the right hand side of an assignment statement. An rvalue can be read but not assigned to. 11

Summary! The floating point types are float, double and long double.! An explicit cast (or conversion) can be performed as in int(3.14).! For arithmetic expressions such conversions are automatically inserted by the compiler. 12