CMPSC 470 Lecture 01 Topics: Overview of compiler Compiling process Structure of compiler Programming language basics Overview of Compiler A. Introduction What is compiler? What is interpreter?
A very brief history of compiler. 1. Initially programs were written in machine language: numeric codes that represent the actual machine operations to be performed. Writing a program in machine language is extremely time consuming and tedious. 2. Soon, this form of program is replaced by assembly language. 3. Later, high-level languages, such as C, were developed, which translate to assembly program or machine language.
int main() int a = 1; while (a!= 10) if (a < 10) a = a + 1; else a = a - 1; Source Program Target Program int main() 00A91650 push ebp function start, initialize stack 00A91651 mov ebp,esp 00A91653 sub esp,0d8h 00A91659 push ebx 00A9165A push esi 00A9165B push edi 00A9165C lea edi,[ebp-0d8h] 00A91662 mov ecx,36h 00A91667 mov eax,0cccccccch 00A9166C rep stos dword ptr es:[edi] int a = 1; 00A9166E mov dword ptr [a],1 move value 1 to location of a while (a!= 10) 00A91675 cmp dword ptr [a],0ah compare value of a and 10 00A91679 je main+47h (00A91697) jump to 00A91697 if equal if (a < 10) 00A9167B cmp dword ptr [a],0ah compare value of a and 10 00A9167F jge main+3ch (0A9168Ch) jump to 00A9168C if greater than or equal a = a + 1; 00A91681 mov eax,dword ptr [a] move value a to register eax 00A91684 add eax,1 add 1 to register eax 00A91687 mov dword ptr [a],eax move value in register eax to location a else 00A9168A jmp main+45h (0A91695h) jump 00A91695 a = a - 1; 00A9168C mov eax,dword ptr [a] move value a to register eax 00A9168F sub eax,1 subtract 1 from register eax 00A91692 mov dword ptr [a],eax move value in register eax to location a 00A91695 jmp main+25h (0A91675h) jump 00A91675 00A91697 xor eax,eax 00A91699 pop edi 00A9169A pop esi 00A9169B pop ebx 00A9169C mov esp,ebp 00A9169E pop ebp 00A9169F ret return
Low-level language: High-level language: Assembly language: Why should we learn compiler?
B. Overview of compiling process 1. A source program may be divided and stored in separated files. 2. Preprocessor does combining separated files into one source program, expanding macros, etc. Then, it generates a modified source program. 3. Compiler produces an assembly-language program, because using assembly program is 4. Assembler produces relocatable machine code. 5. Linker links (or combines) relocatable machine codes with other relocatable object files or library files, and generate target machine code.
C. Structure of Compiler Compiler receives modified source program as a character stream, and produces target machine code (or assembly program). It is composed of two parts: analysis, and synthesis. Analysis part is often called front-end, and synthesis part is called back-end. Front-end: Back-end: They are decomposed into 7 phases again: 1. 2. 3. 4. 5. 6. 7.
C.a) Lexical Analysis (or Scanning, Tokenizing) Lexical analyzer reads source program as a character stream, and groups characters into lexemes. C.b) Syntax Analysis (or Parsing) It create tree-like intermediate representation of source program, showing grammatical structure (called syntax tree).
C.c) Semantic Analysis Semantic analyzer checks source for semantic consistency using syntax tree and symbol table. C.d) Intermediate Code Generation Intermediate code generator generate explicit low-level or machine-like intermediate representation of source, which is a program for an abstract machine.
Three-address code C.e) Code Optimization (machine-independent) Code optimizer improves intermediate code to generate better codes in terms that:
C.f) Code Generation Code generator takes as input an intermediate representation and maps it into target language (s.t. machine dependent binary code). C.g) Symbol-Table Management Symbol-table contains records for variables, functions, etc., in a form of name and field, such that
C.h) Grouping of Phases into Passes Some compilers group several phases into a pass. Example) running each phase in separate steps with producing its output Example) running phases in front-end (lexical analysis, syntax analysis,, intermediate code generation) at the same time in one pass. C.i) Compiler-Construction Tools 1. Parser generator 2. Scanner generator 3. Syntax-directed translation engine 4. Code-generator generator 5. Data-flow analysis engine 6. Compiler-construction toolkit
D. Programming Language Basics D.a) Static policy vs Dynamic policy Scope of a declaration of (variable) xx is the reason of the program where using xx refer to this declaration. Static scope or lexical scope Example) int b = 5; int foo() int a = b + 5; return a; int bar() int b = 2; return foo(); int main() foo(); // return bar(); // return Return 0; Dynamic scope int main() foo(); // return bar(); // return Return 0;
Static policy Dynamic policy D.b) Environment and State A variable has three properties: name, location, and value. There are two stages mapping from name to value.
D.c) Function and Parameter Passing Difference between function, procedure, and method: Parameter passing mechanism call-by-value call-by-reference call-by-name What will be printed if the following program uses call-by-name? void b() cout << "call b" << endl; void c() cout << "call c" << endl; void func(a, b, c) if ( a ) b; else c; int main() func(true, b(), c());