LESSON 13: LANGUAGE TRANSLATION

LESSON 13: LANGUAGE TRANSLATION Objective Interpreters and Compilers. Language Translation Phases. Interpreters and Compilers A COMPILER is a program that translates a complete source program into machine code. The whole source code file is compiled in. one go and a complete, compiled version of the file is produced. This can be saved on some secondary storage medium as floppy disk and hard drive.. This means that: The program can only be executed once translation is complete. Any changes to the source code require a complete recompilation. A INTERPRETER on the other hand, is a program, which provides a means by which a program written in source language can be understood and executed by the CPU line by line. As the interpreter encounters the first line, it is translated and executed. Then it moves to the next line of source code and repeats the process. This means that: The interpreter is a program which is loaded into memory alongside the source program Statements from the source program are fetched and executed one by one. No copy of the translation exists, and if the program is to be re-run, it has to be interpreted all over again. The idea of combining interpreted and native-code text could be traced back to early work by Ershov and others and continues to this day, currently in the active field of partial evaluation. For languages such as C, which have long been implemented primarily by compilation to native code (an exception is the si system interpreters are experiencing somewhat of a comeback. For one explanation, consider that Proebsting and others are making interpreters much more efficient,by combining compiled with interpreted text. However, even optimized interpretation cannot compete with the speed of execution of native code. Through the use of superoperators, Proebsting has achieved interpretation speeds only 3-9 times slower than executing unoptimized native code. Without superoperators that value jumps. Interpretation also has several other features to recommend it.foremost is the interactive nature of interpretation. Interpreted environments can respond immediately to change, without the impediment of lengthy compile times. In interactive applications this can be extremely important. A couple of other advantages of interpretation that are sometimes overlooked are the possibilities of platform independence and smaller distribution file size. Native executables are by nature bound to the specific platform for which they were compiled. In contrast, interpretable code, whether a high-level source language or an intermediate bytecode representation, can be designed to be independent of any particular machine. In addition, an efficient bytecode representation can be much smaller than the equivalent native code. Support for interactive, source-level debugging is also more innate with interpretation. Furnishing such support in a compiled environment is a much more arduous task, required the preservation of a mapping from the generated native code to the original source code. Finally, when comparing the performance of an interpreted environment to that of a compiled environment, the comparison is usually between the speed of interpretation and the speed of execution of native code, without taking into account the time spent producing the native code. This may be an acceptable comparison in some circumstances, but in other situations, such as with mobile computing and software development as described in, the time spent in translation is of vital importance. In other related work, adaptive optimization has been considered for the Self language, with the idea of using a fast compiler to generate initial code while an optimizing compiler recompiles heavily used parts Kastens has considered how to generate interpreters automatically from compiler specifications. In comparing our work with the above, note that we are not as interested in maintaining two forms of language processors (a compiler and interpreter) as we are in examining the role of a common, machine-independent representation for applications. While speeding up interpretation is always a win, there is inevitably a performance gap between interpretation and heavily optimized target-machine code. Perhaps closest in spirit to the paradigm we propose are the new just-in-time (JIT) compilers that are being developed for the Java language.however, these compilers differ from the continuous compilation paradigm that we are proposing in several important aspects. First of all, the JIT compiler does not work in tandem with the interpreter; rather it is meant to replace the interpreter. As classes are loaded into the run-time virtual machine (Java is an objectoriented language), the method pointers in the virtual method table, which in the interpreted version of Java would point to the bytecode of the corresponding methods, are replaced with pointers to the JIT compiler. Then, the first time each method is called, the JIT compiler is invoked to compile the method. The pointer in the virtual method table is then patched to point to the native-code version of the method so that future calls to the method will jump to the native-code. With the JIT 46 3B.582

compiler in place, no interpretation of the methods ever takes place. While this is a valid, and valuable, technique, it differs significantly from the continuous compilation model. With the current implementations of the JIT compilers, compilation is only done on demand (i.e., upon the first call to each method) and execution of the method must wait until compilation is complete. As long as the average size of an individual method tends to be rather small, the wait is not likely to be noticeable. However, it means that compilation speed is of great importance. Consequently, time cannot be spent on optimizing the generated code. In addition, using a compile-on-demand model in an interactive environment (which is where Java seems to be aimed) leads to inefficiency. In environments characterized by user interaction, the CPU often remains idle, or runs well below 100% capacity, for much of the time while waiting for user input. If some sort of precompiling (analogous to prefetching in memory management or disk caching) model is used in the compile-ondemand system, then this idle time could be put to use compiling methods that are likely to be called in the near future. However, the current JIT compiler implementations do not seem to support precompiling (although they almost certainly will in the future). The JIT compilers are closely tied to the Java language and runtime system. They do not produce independently executable object files. They are meant to only speed up execution of class methods called by the Java run-time virtual machine. While the continuous compilation paradigm can be applied to Java, it is a considerably more generic approach. It is meant to be applicable to any situation in which traditional compilation or interpretation is used. Language Translation Phases Whatever programming language we use, it has to be translated into machine language using what are called translators (compilers for high level language, assemblers for Assembly languages). There are various issues while translating the high level language. Each issue is taken care by one of the phases of the compilation process. We will discuss the compilation process in brief here. The structure of a typical compiler is as given in the Fig. Source program 1. Lexical Analysis: Target code Fig. 1.1: Typical compiler structure This phase is also called linear analysis or scanning. The job of this stage of the compiler is to break up the source program into meaningful tokens. (A token is a group of characters which comes under one of the categories of tokens) Take for example, a statement in a programming language like Would be broken up into following tokens 1. The identifier result 2. The assignment symbol: = 3. The identifier first 4. The plus sign 5. The identifier second 6. The multiplication sign 7. The number 5; Some of the groups of tokens are i. identifier ii. operator s iii. Keywords, etc. result: = first + second * 5; 2.Syntax Analysis Hierarchical analysis is called parsing or syntax analysis. The job of this phase is to group the tokens into grammatical phrases to synthesize output. In short, syntax trees are formed in this phase. The parse tree for the above example is shown in Fig. Assignment Statement In simple words as the name suggests, the job of this phase is to check the syntax of all 3B.582 47

the statements in the source program. If the syntax is wrong, this phase flags errors. 3. Semantic Analysis This phase checks the source program for semantic errors. This phase collects the data type information of all the data declared in the program, so that in later phases this information can be used. One of the important components of semantic analysis is type checking. In this type checking the compiler checks that each operator has operands that are allowed in the language. Take for example, + operator can accept either, two integers, two chars C internally characters are treated as integers), two floats or any numeric data. But, if we declare two complex numbers (Using structure and type def) and then try to add two complex numbers, it is not possible. (This is possible in C++ by concept called operator overloading). But the essence of this topic is that type checking means whether the operands are of correct type or not. The example where semantic analysis has to be performed is below. float result; float firstno; int secondno; result = firstno + secondno; / / Here compiler internally promotes second no to float. 4.Intermediate Code Generation In this phase, the statements in the source language are translated into the equivalent statements which can be ultimately used to produce target statements. The importance of this state is that it is just sufficient to write back end tool to convert from intermediate code to new machine target language. There are various forms of this representation as (1) Triple (2) Quadruple (3) Abstract machines 4) Indirect triple. For example, if our statement is to be translated into triple, it would be as follow: temp1: = inttoreal (5) // temp1, temp2, temp3 are temp2 : = id3 * temp1 // compiler generated temporary variables temp3 : = id2 + temp2 // id1 : = temp3 Here id1 for result ; id2 for first, id3 for second. We also assumed that result, first and second variables are of float type. 5. Code Optimzation As the name indicates, the job of this phase is to optimize the code generate in the previous phase. There are many places where we can optimize the generated statements. By optimization, we mean reducing the number of generated statements without loosing the functionality in any way. This we try to do, so that faster -running machine code will be available. For example, the above statements (in step 4) can be optimized to id1 = id2+temp1 Temp1: = id3 * 5.0 optimized Intermediate code But note, here that the original functionality i.e. correct assignment still takes place in this case also. It can never happen that the result of the steps in (4) and that in (5) will produce different results. 6.Code Generation This is the last phase of the compiling process. This phase produces relocatable machine or assembly code. In this phase, the optimized intermediate code is mapped into the target machine instructions. The machine register use is of utmost importance here. This is the back end phase of the compiler since it depends upon the instruction set of the target machine. For example, the machine code for above statements might be something like, MOV id3, R2 //Here R1 and R2 are machine register, MUL 5.0, R2 // MUL, MOV, ADD are machine instructions MOV ADD MOV id2, R1 //and operands in the instruction are R2, R1 // source, destination respectively Rl, id1 Language Design Issues There are basically three issues that are to be addressed while designing a Programming language. They are : 1. The underlying computer upon which programs will be executed. 2. The execution model, or virtual computer that supports that programming language to execute on actual hardware and 3. The computational model of the programming language. Structure and Operation of a Computer Computer can be defined as integrated set of algorithms and data structures capable of storing and executing programs. Actual Computer / Hardware Computer A computer made up of physical devices such as wires, integrated circuits, and like is called actual or hardware computer. Software Simulated Computer It is made up of a program which is executed on another machine. So it is a simulation. A computer consists of six major components that are closely related to programming language. 48 3B.582

Fig. Software simulated computer 1. Data: Computer must provide different kinds of built-in data items and data structures that can be manipulated. 2. Primitive operations: Computer must provide a set of primitive or built-in functions that enable us to manipulate data. 3. Sequence control: Computer must provide various mechanisms for enabling us to control the execution sequence of the program statements. 4. Data access: A computer must provide mechanisms for controlling the data supplied to the functions. 5. Storage management: Computer must provide facilities for controlling the allocation of memory for programs and data. 6. Operating environment: It should provide facilities for communication with external environments such as external devices. Examples of Language Translators In order for us to make the computer perform the tasks we want it to, we have to communicate with the machine. We write our programs in a programming language such as C, C++, VB, Java etc. But as you know that since machine understands and works with O s and l s it cannot understand these language instructions. So what has to be done is convert these instructions or statements into machine code which the computer can understand and execute. (We call this code as object code). Basically, there are two ways in which we can translate our highlevel language programs into machine code.. 1. Compilation 2. Interpretation Some of the language such as C, C++, Turbo Pascal use compilation, some other languages such as LISP, QBasic use interpretation and some others such as Java uses the combination of compilation and interpretation. Basically both these processes (compilation and interpretation are carried out by something called COMPILERS AND INTERPRETER respectively. These are nothing but themselves programs which, have been written whose job is to convert high -level programs into low-level bits. What is the difference between a compiler and an interpreter? An situation will help you understand the difference between compiler and Interpreter? There is a situation where there are two persons who know only English language. If they among themselves wish to communicate, there is no language problem. But if one of them wish to talk to an Japanese speaking person? There are two ways in which it can be done. 1. The English speaking person seeks the help of third person who knows both. English and Japanese. That third person translates the English sentences into Japanese sentences as they are spoken. The English speaker says a sentence in English, that third fellow hears it, and then translates into Japanese. This process is repeated for each sentence. Note that progress is slow. There are pauses between sentences as the translation takes place. Another thing to note is that the third-fellow does not have to remember or keep in his memory previous sentences which he has spoken. 2. The English speaker writes down in English what he wants to say to a Japanese fellow. That document is then translated into Japanese document i.e. it writes down equivalent Japanese sentences into document. There are two things to be considered as this moment. a. There is a slight delay in the beginning as the complete English document has to be translated completely before it can be read. b. The translated document (in Japanese) can be read any time and at the speed in which the Japanese person can read it. In above situation, if you treat English language as the SOURCE CODE and Japanese language as MACHINE CODE. The CPU is Japanese speaker s brain. Points to Ponder A COMPILER is a program that translates a complete source program into machine code A INTERPRETER on the other hand, is a program, which provides a means by which a program written in source language can be understood and executed by the CPU line by line The interpreter must be loaded into memory, there is less space available during execution; a compiler is only loaded into memory during compilation stage, and so only the machine code is resident in memory during run-time; Once we have a compiled file, we can re-run it anytime we want to without having to use the compiler each time, with any interpreted language, however, we would have to reinterpret the program each time we wanted to run it; Machine code programs run more quickly than interpreted code/programs; However, it is often quicker and easier to make changes with an interpreted 3B.582 49

program than a compiled one, and as such development time may be reduced. Question What is compiler & interpreters? Discuss in details. What do u understands by Language Translation Phase? Reference www.cs.wustl.edu Notes 50 3B.582