Compilers for Modern Architectures Course Syllabus, Spring 2015 Instructor: Dr. Rafael Ubal Email: ubal@ece.neu.edu Office: 140 The Fenway, 3rd floor (see detailed directions below) Phone: 617-373-3895 Office hours: Wednesday 10-11am and Thursday 3-4pm Class schedule: Wednesday and Friday, 11.45am-1.25pm, 300 Richards Hall Overview This course is intended to teach the students the structure of real-world compilers, ranging from traditional compilation schemes to recent design trends based on upcoming high-level languages (such as data-parallel programming models) or new computer architectures (such as heterogeneous processors). The course material is organized in two parts: The first part covers the compiler front-end, including lexical analyzers, syntax analyzers, and syntax-driven code generators, together with the basic theory supporting their implementation. The second part deals with the compiler back-end, including intermediate code representations, code structure analysis, code optimization, and target ISA generation. Throughout the course, students will progressively implement a full compiler for MiniC, a simplified version of the C programming language. The compiler will produce optimized LLVM code, a modern intermediate language used in both open-source and commercial compilers.
Course Objectives Understand the modular structure of a compiler and the purpose of a front-end, intermediate language, code generator, code optimizer, back-end, and linker. Learn the theory related with regular expressions and finite state automata applied to the implementation of a lexical analyzer. Learn the theory related with context-free LL(1) grammars applied to the implementation of an efficient syntax analyzer. Get familiar with open-source tools widely used for compiler development, including generators of lexical and syntax analyzers. Get familiar with the LLVM intermediate language specification and the ecosystem of opensource tools available for LLVM. Understand the motivation for code optimizers through the implementation of optimization passes on target-independent LLVM code. Understand the process of code finalization to produce target-specific ISA through the study of an LLVM-to-x86 back-end. Apply all theoretical concepts covered in the course through the progressive development of a full compiler for MiniC, a simplified version of the high-level C language. Prerequisites The assignments in this course require you to be familiar with a Unix environment, basic shell commands (ls, cd, make, vi), and the GNU gcc programming environment (gcc, g++, gdb, gprof). The development of the course project and homework assignments relies on a strong background in C++ programming. In particular, you need to be familiar with object-oriented programming, C++ classes, inheritance, and virtual methods. You also need a good understanding of basic data structures and algorithms, and their implementation using the C++ standard library containers. This includes arrays, linked lists, binary trees, hash tables, and graph representations. The following ECE course and reading is recommended before taking this course: EECE 7205, Fundamentals of Computer Engineering P. Deitel and H. Deitel, C++, How to Program, 9th Edition, ISBN 978-0133378719
Textbooks The core material presented in this course is extracted from these two textbooks: Alfred V. Aho, Monica S. Lam, R. Sethi, Jeffrey D. Ullman, Compilers: Principles, Techniques, and Tools, 2nd Edition, ISBN 978-0321486813 Steven S. Muchnick, Advanced Compiler Design and Implementation, ISBN 978-1558603202 Grading Homework 20% Quizzes 20% Course project 20% (+10% extra credit) Midterm exam 20% Final exam 20% Homework assignments There will be a total of 10 weekly homework assignments. Assignments will be posted on Blackboard at least 7 days before their due date, and must be submitted on Blackboard as well. Each homework assignment will typically require you to upload a PDF file with your answers (either hand-written and scanned, or electronically typed) plus a tarball (.tgz file) with your source code ready to be built and run. Homework due dates are strict deadlines with no exceptions, specified at the end of this document. Homework solutions will be available on Blackboard automatically after the due date, so late homework will not be accepted under any circumstances. Please make sure that you submit your assignments in advance in order to avoid unexpected submission problems due to Internet connectivity issues, trouble with PDF document generation, etc. To add some flexibility to this policy, the average grade for homework assignments will be calculated by discarding either the one that received the lowest grade or was not submitted on time at all. This exception is aimed at covering any inevitable situation that prevented you from submitting a homework assignment on time, while it also benefits those students with no missing assignment. Midterm and final exams A midterm exam will cover the first part of the course material. A comprehensive final exam will focus on the second part of the course, but will also include the material corresponding to the first part. The dates for both the midterm and the final exam will be announced at the beginning of the semester.
Course projects During the course, you will work on the development of a compiler for MiniC, a subset of the C highlevel programming language. Your compiler will produce LLVM code with support for certain optimization passes. This work will be done either individually or in groups of at most two people. The last question of each homework assignment will ask you to implement a different module of the compiler, stressing a particular independent functionality. This question will be an open-ended problem, allowing you to choose extra features that will make your compiler more complete, robust, or user friendly. The development of the course project will consist in putting together the modules developed in each homework assignment in order to create a fully working MiniC-to-LLVM compiler, together with a reasonable set of tests that demonstrate its capabilities. You will be able to earn up to 10% extra credit by introducing additional functionality, either inspired in the suggestions from open-ended homework assignments, or based on ideas of your own. Extra features in your project will be graded based on creativity, novelty, usability, and coding style. You can also choose to give a 10-minute presentation on your project during the last lecture. Due to time limitations, a selection of volunteering presenters might be needed, which will be based on the quality of the project. Finally, students with outstanding projects will be offered to contribute their work to Multi2Sim, an open-source compiler for heterogeneous systems collaboratively developed by Northeastern students, and with an increasing impact in the computer architecture research community. Quizzes There will be a total of 4 quizzes during the semester, on the dates specified in the schedule at the end of this document. Quizzes will have an approximate duration of 20 minutes, and will start in the beginning of the lecture time. Attendance and Punctuality While attendance to the lectures is highly recommended, punctuality in class is indispensable, and constitutes a basic rule of respect toward your instructor and class mates. If any particular reason forces you to come in late to class, please notify your instructor in advance.
Course Topics Part I Introduction, history of compilers, programming languages, psuedo-code conventions for the course. Lexical analysis, deterministic and non-deterministic finite automata, regular expressions, automatic generation or scanners, tool flex. Formal grammars, parse trees, context-free grammars, grammar ambiguity, left-factoring, the MiniC grammar. Top-Down parsing, LL(1) grammars, predictive parsing, recursive-descent parser, parsing tables. Bottom-Up parsing, LR(0), SLR(1), LR(1), and LALR(1) parsers, automatic generation of parsers, tool bison. Semantic analysis, symbol tables, types and declarations, type checking. The LLVM intermediate language, LLVM tool set, single-static assignment (SSA) form, Phi nodes. Part II Syntax-directed translation, LLVM code generation. Basic blocks, control-flow graphs, control-flow analysis, data-flow analysis. Optimizations I: constant folding, constant propagation, common-subexpression elimination. Optimizations II: loop-invariant code motion, induction-variable optimization, dead code elimination. Code finalization, the x86 Instruction Set Architecture (ISA), the LLVM-to-x86 back-end. Register allocation, liveness analysis, interference graphs. Compilation challenges in new computer architectures: SIMD execution, thread divergence, structural analysis, heterogeneous systems. Research opportunities at Northeastern, the Multi2Sim compiler and simulator, class project presentations.
Office Location 1) Find the office building at 140 The Fenway (TF), and enter the main door located at the parking lot. 2) Take the main elevator to the 3rd floor. Elevator Parking lot Main door (1st floor) Parking lot 3) Once on the 3rd floor, call me at 617-373-3895. My office is in a locked research laboratory. I will meet you on the hallway right by the elevator and let you in.
Access to the fusion1 Linux System A Linux machine has been enabled with all software installed that we will use for class. This machine is accessible through a remote SSH connection to fusion1.ece.neu.edu. You can use your Northeastern login, and the original password is compilers2015. You will need to change your password the first time you log in to the machine. If you work on Linux or MAC, you can connect to the fusion1 machine by opening a terminal an typing the following command: ssh smith.j@fusion1.ece.neu.edu (assumming smith.j is your login name) If you work on Windows, you can download an SSH client, such as Putty, and enter the connection details. If you don't have a personal machine available, you can access the computer labs of the College of Engineering at 271 Snell Engineering, where you can find both Linux and Windows machines at your disposal. Both versions have their SSH clients installed, which you can use to connect to fusion1.
Important Dates Week 1 1/11 Week 2 1/18 Week 3 1/25 Week 4 2/1 Week 5 2/8 Week 6 2/15 Week 7 2/22 Week 8 3/1 Wednesday 1/28: HW #1 due Wednesday 2/4: HW #2 due Wednesday 2/4: Quiz #1 Wednesday 2/11: HW #3 due Wednesday 2/18: HW #4 due Wednesday 2/18: Quiz #2 Wednesday 2/25: HW #5 due Wednesday 3/4: Midterm exam 3/8 through 3/14: Spring break Week 9 3/15 Week 10 3/22 Week 11 3/29 Week 12 4/5 Week 13 4/12 Wednesday 3/25: HW #6 due Wednesday 4/1: HW #7 due Wednesday 4/1: Quiz #3 Wednesday 4/8: HW #8 due Wednesday 4/15: HW #9 due Wednesday 4/15: Quiz #4
Week 14 4/19 Wednesday 4/22: HW #10 due 4/27 through 5/2: Final exams (exact date TBD)