Programming in C++ 4. The lexical basis of C++! Characters and tokens! Permissible characters! Comments & white spaces! Identifiers! Keywords! Constants! Operators! Summary 1
Characters and tokens A C++ program consists of one or more files which contain series of characters. Characters are normally represented as one byte for character. A compiler resolves (parses) the C++ program into a series of tokens. There are five types of tokens:! identifiers! keywords! constants! operators! other separators 2
Permissible characters All permissible characters can be divided into five groups:! lower case letters a b c... x y z! upper case letters A B C... X Y Z! digits 0 1 2 3 4 5 6 7 8 9! special characters! % ^ & ( ) _ - + = \ <, >.? / ~ # : ;! non printing characters blank carriage return new line etc. 3
Comments Comments are an essential feature of good programming practice. In C++ there are two ways of denoting comments. Two forward slashes // indicate the start of a comment which continues until the end of a line. // This is a comment x = 1; // The first part is not a comment An alternative technique is to use /* to indicate the start of a comment which continues until */ is encountered. Such comment can continue over many lines. /* This is a comment */ /* This is a longer comment */ 4
Comments The /* */ technique can include one or more // style comments. /* First comment // Second comment continuation of first comment */ However the /* */ style of comments cannot themselves be nested. This restriction is a minor nuisance. We cannot exclude by commenting a piece of code containing this style of comments!!! Blanks are not allowed within the tokens defining either style of comment. An incomplete /* */ of comments can lead to errors. 5
White spaces Such entities like! comments! blanks! vertical and horizontal tabs! form feeds! new lines are collectively known as white spaces. White space is not allowed in any token, except a in character or string constants. The compiler ignores any white space that occurs between tokens. Such white space is effectively a separator. 6
Identifiers An identifier is a sequence of some combination of letters, digits and the underscore character _. Both lower and upper characters are valid and distinct. In both C and C++ there is a tradition using mainly lower case except e.g. global constants. This tradition makes for easily readable code. There should be no limit on the number of characters in an identifier. An important restriction on identifiers is that they must start with a character or underscore rather then a digit. velocity! // illegal character! 1velocity // starts with 1 v e l o c i t y // illegal spaces initial-velocity // illegal character - Identifiers with a leading underscore or embedded double underscores are not recommended - such identifiers can be produced by compiler. 7
Keywords Keywords are special identifiers which have a significance defined by the language rather than programmer. asm auto break case catch char class const continue default delete do double else enum extern float for friend goto if inline int long new operator private protected public register return short signed sizeof static struct switch template this throw try typedef union unsigned virtual void volatile while Some compilers reserve some more keywords: ada fortran pascal overload huge near far 8
Constants Constants which are also known as literals can be: integer floating point character string. Integer constants Integer constants consist only of sequence of digits. A negative integer is an integer constant expression!!! Octal (base eight) constant starts with 0 and cannot include 8 and 9. Hexadecimal (base sixteen) integer constants starts with 0x or 0X and may include letters a..f or A..F - the case is not significant. Floating point constants Floating point constants include a decimal point and/or an exponent. The value following e or E is the exponent. 9
Constants Character constants A character inside single quotes as a is a character constant and is usually represented internally by one byte. Certain hard to get at characters are represented by using escape sequence, starting with a backslash. This is the complete list of them: new line \n backslash \\ horizontal tab \t question mark \? vertical tab \v single quote \ backspace \b double quote \ carriage return \r octal number \032 form feed \f hex. number \x032 allert (bell) \a The generalized escape sequence consists of up to three octal or as many hexadecimal digits as required. For instance \61 \x31 both represent the character 1. 10
Operators String constants String constants are sequences of characters between double quotes. Internally a string is represented by an array of characters. The escape sequence \0 is used after the last character to denote the end. Operators An operator is a language defined token consisting of one or more characters which instructs the computer to perform some action. (). [] -> :: ->* & * delete new! ~ ++ -- - / % sizeof +.* << >> < <= > >= ==!= ^ &&?: = += -= *= /= %= <<= >>= &= ^= =, Keywords are also operators. Multi-character operators form a single token. White space is not allowed within a token. 11
Summary! A C++ source file consists of a sequence of tokens which are made up of one or more characters.! White space (including blanks, new lines, tabs etc.) is not allowed inside tokens.! There are five types of tokens: identifiers, keywords, constants, operators and other separators.! Use meaningful names for user-defined identifiers, e.g. flow_rate! Identifiers with language-defined significance are known as keywords. The significance of them cannot be altered.! Constants can be integer, floating point, character or string.! Operators are tokens representing operations such as assignment, addition, multiplication. 12