Chapter 10 Language Translation A program written in any of those high-level languages must be translated into machine language before execution, by a special piece of software, the compilers. Compared to assemblers, compilers are very difficult to design, thus, many person-years have to be spent there. One machine language instruction leads to one assembly language instruction. Hence, an assembler really just replaces something with something else. 1
On the other hand, one high-level language statement may lead to many machine language instructions. For example, the following Pascal statement A=B+C-D; corresponds to the following four instructions LOAD ADD SUBTRACT STORE B C D A To generate the corresponding instructions, a compiler must do a thorough analysis of the structure (syntax) and meaning (semantics) of the involved program, which is very complicated and difficult. 2
What should the compiler do? When performing translation, the foremost goal is to be correct. The generated machine language program must do exactly what the original program does, no more, no less. For example, the following machine code LOAD ADD STORE SUBTRACT STORE B C B D A does not correctly translate the statement A=B+C-D since it destroys the original data in B. 3
The second goal is that the resulted machine code should be efficient and concise. For example, to sum up 2x 1 +2x 2 + +2x 50000, for the following poorly written Pascal program (?) sum:=0.0; for i:=1 to 50000 do sum:=sum+(2.0*x[i]); the compiler should generate more efficient codes, as if the code had been based on the following: sum:=0.0; for i:=1 to 50000 do sum:=sum+x[i]; sum:=2.0*sum; 4
Compilation process There are generally four phases for the translation process. 1. Lexical analysis: The compiler looks at the individual characters in the source program and groups them into syntactic unit, called tokens. 2. Parsing: Sequence of tokens will be checked to see if it forms a syntactically correct program according to a specific program language. 3. Semantic analysis and code generation: The compiler analyzes the meaning of a program and generates the proper code. 4. Code optimization: The compile tries to make the just generated code more efficient. 5
Lexical analysis In this part, a lexical analyzer, part of the compiler, reads in a sequence of characters, and puts them into tokens. For example, for the following Pascal statement a = b + 319 - delta; based on the individual symbols such as, a, =, b, 3, 1, 9, -, d, e, l, t, a, ;, the analyzer forms the following 8 tokens: a, =, b, +, 319, -, delta, ;. From now on, the compiler can work at the level of symbols, numbers, and operators. 6
Token classification Besides forming tokens, the analyzer also tried to categorize them. For example, all names will be assigned a category 1, while all numbers, regards its values, will be assigned a 2, etc. We can have the following table. Token Type Classification symbol 1 number 2 = 3 + 4-5 ; 6 == 7 if 8 then 9 else 10 7
The reason that the tokens can be categorized is because we only care what occurs in where. For example, the following is a legal assignment, no matter what symbols are used and what values that number has. symbol = symbol + number ; To summarize, the input to a lexical analyzer is a high-level language statement from the source program. Its output is a list of all the tokens contained in the program, as well as their classification numbers. Homework: Exercises 1, 2 and 3. 8
Parsing During the parsing phase, a compiler determines if the tokens recognized fit together in a grammatically correct way, i.e., if it is a syntactically legal statement of the programming language. For example, the following assignment statement a = b + c; is a legal statement as we can construct the following parse tree. It shows how those tokens are grouped together. 9
Semantics and code generation During parsing, a compiler deals with the syntax of a statement, i.e., its syntactical structure. But, it is not the case that every syntactically correct sentence makes sense, e.g., The man big the dog. This problem is dealt with by checking the semantics of the statement. The compiler will analyze the meaning of the tokens and understand the actions it tries to perform. If it does not make sense, it will be rejected; otherwise, it will be translated into machine language. Given the following statement: sum=a+b; Although it is syntactically correct, it still might not make sense if we know that the types of a and b are char and integer, respectively. 10
Semantic record The previous example tells us that we have to add some additional information, such as the type of a data item, to the parsing tree. In general, we attach a semantic record to every node in the parsing tree. For example, below shows a more general parsing tree for a+b: Based on this information, the compiler can easily reject the expression that constructs the following tree. 11
Thus, the first step of code generation has to be semantic analysis, which checks every branch of the parse tree to make sure that they are semantically meaningful. If not, then it will report the errors. Otherwise, it will get into the next phase to produce the code. For example, below is a parse tree for x=y+z; 12
The completed code If we follow the appropriate procedure, we will translate the above parse tree into the following machine instructions LOAD Y ADD Z STORE TEMP LOAD TEMP STORE X... X:.DATA 0 Y:.DATA 0 Z:.DATA 0 TEMP:.DATA 0 Homework: Exercise 18. 13
Code optimization When compilers came out during the 1950 s, they were not accepted that well. The major reason is that the code they generated were not that efficient, hence, the need for code optimization. There are two types of optimization: local and global optimization. In local optimization, the compiler looks at a very small block of instruction to see if any more improvement can be made. For example, if an expression can be fully evaluated at this time, it should. Hence the following constant evaluation: LOAD ONE LOAD TWO ADD ONE =====> STORE X STORE X 14
Other techniques We also want to use simpler, and less time consuming operations, hence the following strength reduction: LOAD X LOAD X MULTIPLY TWO =====> ADD X STORE X STORE X Also we want to eliminate unnecessary work: LOAD Y LOAD Y STORE X =====> STORE X LOAD X STORE Z STORE Z 15
Global optimization We have seen s few local optimization. Global optimization requires the ability to see the big picture, which is more difficult and not always done. Below shows an example. Given the following code sum=0.0; i=0; while(i<=50000){ sum=sum+(2.0*x[i]); i=i+1; } 16
Global reduction We can eliminate lots of multiplication by moving this operation out of the loop. sum=0.0; i=0; while(i<=50000){ sum=sum+x[i]; i=i+1; } sum=2.0*sum; Homework: Exercises 22 and 23. 17