Compiler Design Dr. Chengwei Lei CEECS California State University, Bakersfield
The course Instructor: Dr. Chengwei Lei Office: Science III 339 Office Hours: M/T/W 1:00-1:59 PM, or by appointment Phone: (661) 654-2102 E-mail: clei@csub.edu
Course Description Understanding compilers as a means to implement modern programming languages. Learn some of the fundamental theories and algorithms behind compiler construction. Learn what components are inside every compiler, and know how to build small, simple compilers.
Textbook No textbook is required for the class. However, you may find a textbook useful as a reference or to learn more details of some of the ideas discussed in the course. Engineering a Compiler (Second Edition) Keith D. Cooper and Linda Torczon, Published by Morgan Kaufmann Compilers: Principles, Techniques, and Tools (Second Edition) Alfred Aho, Monica Lam, Ravi Sethi, and Jeffrey Ullman. Published by Addison-Wesley
Method of Instruction Lecture; Demonstration; Lab
Semester Grade Average of Regular Semester Exams 30% Exams Homework & Labs 40% As determined by instructor Comprehensive Final 30% Final
Late homework submissions Each assignment is due at the specified date at the specified time. Late submissions will be accepted at a penalty of 20% for each day the assignment is late.
Time MoWe 4:00 PM 5:15 PM Lecture Tu 4:00 PM -6:30 PM Lab Final Wed Dec 13 2016 from 2:00 to 4:30pm
Cheating You are not allowed to read, copy, or rewrite the solutions written by others (in this or previous terms). Copying materials from websites, books or any other sources is considered equivalent to copying from another student. If two people are caught sharing solutions, then both the copier and copiee will be held equally responsible, which will result in zero point in homework. Cheating on an exam will result in failing the course. 9 9/1/2017
To do well you should: Study with pen and paper Ask for help immediately Practice, practice, practice Follow along in class rather than take notes Ask questions in class Keep up with the class Read the book, not just the slides 10
Getting answers from the internet is CHEATING Getting answers from your friends is CHEATING I will send it to the Dean! You will be nailed! However, teamwork is encouraged. Group size at most 3. Clearly acknowledge who you worked with. 9/1/2017 11
Do NOT get answers from other groups! Do NOT do half the assignment and your partner does the other half. Each try all on your own. Discuss ideas verbally at a highlevel but write up on your own. 12 9/1/2017
Please feel free to ask questions! Help me know what people are not understanding We do have a lot of material It s your job to slow me down 13 9/1/2017
Compilers What is a compiler? A program that translates an executable program in one language into an executable program in another language The compiler should improve the program, in some way What is an interpreter? A program that reads an executable program and produces the results of executing that program C is typically compiled, Scheme is typically interpreted 14 Java is compiled to bytecodes (code for the Java VM) which are then interpreted Or a hybrid strategy is used Just-in-time compilation Common mis-statement: X is an interpreted language (or a compiled language) Comp 412, Fall 2010
Why Study Compilation? Compilers are important Responsible for many aspects of system performance Attaining performance has become more difficult over time In 1980, typical code got 85% or more of peak performance Today, that number is closer to 5 to 10% of peak Compiler has become a prime determiner of performance Compilers are interesting Compilers include many applications of theory to practice Writing a compiler exposes algorithmic & engineering issues Compilers are everywhere Many practical applications have embedded languages Commands, macros, formatting tags Many applications have input formats that look like languages 15
Reducing the Price of Abstraction Computer Science is the art of creating virtual objects and making them useful. We invent abstractions and uses for them We invent ways to make them efficient Programming is the way we realize these inventions Well written compilers make abstraction affordable Cost of executing code should reflect the underlying work rather than the way the programmer chose to write it Change in expression should bring small performance change Cannot expect compiler to devise better algorithms Don t expect bubblesort to become quicksort 16
Making Languages Usable It was our belief that if FORTRAN, during its first months, were to translate any reasonable scientific source program into an object program only half as fast as its hand-coded counterpart, then acceptance of our system would be in serious danger... I believe that had we failed to produce efficient programs, the widespread use of languages like FORTRAN would have been seriously delayed. John Backus on the subject of the 1 st FORTRAN compiler 17
All data collected with gcc 4.1, -O3, running on a queiscent, multiuser Intel T9600 @ 2.8 GHz Simple Examples Which is faster? for (i=0; i<n; i++) for (j=0; j<n; j++) A[i][j] = 0; for (i=0; i<n; i++) for (j=0; j<n; j++) A[j][i] = 0; All three loops have distinct performance. 0.51 sec on 10,000 x 10,000 array 1.65 sec on 10,000 x 10,000 array p = &A[0][0]; t = n * n; for (i=0; i<t; i++) *p++ = 0; Conventional wisdom suggests using 0.11 sec on 10,000 x 10,000 array A good compiler should know these tradeoffs, on each target, and generate the best code. Few real compilers do. bzero((void*) &A[0][0],(size_t) n*n*sizeof(int)) 0.52 sec on 10,000 x 10,000 array 18
Intrinsic Merit Compiler construction poses challenging and interesting problems: Compilers must process large inputs, perform complex algorithms, but also run quickly Compilers have primary responsibility for run-time performance Compilers are responsible for making it acceptable to use the full power of the programming language Computer architects perpetually create new challenges for the compiler by building more complex machines Compilers must hide that complexity from the programmer A successful compiler requires mastery of the many complex interactions between its constituent parts 19
Intrinsic Interest Compiler construction involves ideas from many different parts of computer science Artificial intelligence Greedy algorithms Heuristic search techniques Algorithms Theory Systems Architecture Graph algorithms, union-find Dynamic programming DFAs & PDAs, pattern matching Fixed-point algorithms Allocation & naming, Synchronization, locality Pipeline & hierarchy management Instruction set use 20
Why Does This Matter Today? In the last 4 years, most processors have gone multicore The era of clock-speed improvements is drawing to an end Faster clock speeds mean higher power (n 2 effect) Smaller wires mean higher resistance for on-chip wires For the near term, performance improvement will come from placing multiple copies of the processor (core) on a single die Classic programs, written in old languages, are not well suited to capitalize on this kind of multiprocessor parallelism Parallel languages, some kinds of OO systems, functional languages Parallel programs require sophisticated compilers Think of the Intel/AMD bet on multicore as a full-employment act for well-trained compiler writers 21
Short History of Compiler Construction Formerly "a mystery", today one of the best-known areas of computing 1957 Fortran first compilers (arithmetic expressions, statements, procedures) 1960 Algol first formal language definition (grammars in Backus-Naur form, block structure, recursion,...) 1970 Pascal user-defined types, virtual machines (P-code) 1985 C++ object-orientation, exceptions, templates 1995 Java just-in-time compilation We only look at imperative languages Functional languages (e.g. Lisp) and logical languages (e.g. Prolog) require different techniques. 22
Why should I learn about compilers? It's part of the general background of a software engineer How do compilers work? How do computers work? (instruction set, registers, addressing modes, run-time data structures,...) What machine code is generated for certain language constructs? (efficiency considerations) What is good language design? Opportunity for a non-trivial programming project Also useful for general software development Reading syntactically structured command-line arguments Reading structured data (e.g. XML files, part lists, image files,...) Searching in hierarchical namespaces Interpretation of command codes... 23
Dynamic Structure of a Compiler character stream v a l = 1 0 * v a l + i lexical analysis (scanning) token stream 1 (ident) "val" 3 (assign) - 2 (number) 10 4 (times) - 1 (ident) "val" 5 (plus) - 1 (ident) "i" token number token value syntax analysis (parsing) syntax tree Statement Expression Term ident = number * ident + ident 24
Dynamic Structure of a Compiler syntax tree Statement Expression Term ident = number * ident + ident semantic analysis (type checking,...) intermediate representation syntax tree, symbol table,... optimization code generation machine code ld.i4.s 10 ldloc.1 mul... 25
Single-Pass Compilers Phases work in an interleaved way scan token parse token check token generate code for token n eof? y The target program is already generated while the source program is read. 26
Multi-Pass Compilers Phases are separate "programs", which run sequentially scanner parser sem. analysis... characters tokens tree code Each phase reads from a file and writes to a new file Why multi-pass? if memory is scarce (irrelevant today) if the language is complex if portability is important 27