Compiler Construction WWW: http://www.cs.uu.nl/wiki/cco Contact: J.Hage@uu.nl Edition 2016/2017
Course overview 2
What is compiler construction about? Programs are usually written in a high-level programming language, such as Haskell: fibs = 0 : 1 : zipwith (+) fibs (tail fibs) main = print (fibs!! 42) To run them on hardware, they need to be translated into machine-executable code: ce fa ed fe 07 00 00 00 03 00 00 00 02 00 00 00 0c 00 00 00 00 06 00 00 85 00 00 00 01 00 00 00 38 00 00 00 5f 5f 50 41 47 45 5a 45 52 4f 00... 3 This translation is typically carried out by a piece of software known as a compiler.
Why study compiler construction? Most computer scientists will never have to write a compiler for a full-scale programming language such as Haskell, Java, or C. 4
Why study compiler construction? Most computer scientists will never have to write a compiler for a full-scale programming language such as Haskell, Java, or C. However, many problems studied in compiler construction show up in the implementation of other sorts of software as well; most prominently, consuming, validating, manipulating, and producing structured data. 4
Why study compiler construction? Most computer scientists will never have to write a compiler for a full-scale programming language such as Haskell, Java, or C. However, many problems studied in compiler construction show up in the implementation of other sorts of software as well; most prominently, consuming, validating, manipulating, and producing structured data. Moreover, compilers are typically excellent examples of well designed software and the utilisation of formal methods in software development. 4
Why study compiler construction? Most computer scientists will never have to write a compiler for a full-scale programming language such as Haskell, Java, or C. However, many problems studied in compiler construction show up in the implementation of other sorts of software as well; most prominently, consuming, validating, manipulating, and producing structured data. Moreover, compilers are typically excellent examples of well designed software and the utilisation of formal methods in software development. A popular trend is the use of so-called domain-specific languages: small languages dedicated to a specific problem domain. The implementation of such languages may involve constructing a compiler or interpreter. 4
Themes Principles of programming languages. Formal semantics. Code generation. Run-time systems. Type systems. Metaprogramming. Generative programming. Syntax-driven/tree-oriented programming (attribute grammars). Theory into practice: everything implemented. 5
What you can expect to get out of this course A basic understanding of the design and implementation of compilers and interpreters. A closer look at typical programming-language constructs. An introduction to the specification and implementation of type systems for programming languages. The analysis of first-order and higher-order languages Some more advanced topics (tbd). 6
What this course is not A course on functional programming. A course on parsing and formal language theory. A course on combinator-language design. A course on assembly programming. A course on computer architecture. A course on logic and proof theory. An in-depth course on type theory. 7
Administratrivia 8
Course form Lectures: (about) 2 2 hours per week. First: focus on lab exercises Later: capita selecta Lab exercises: 3x, including Attribute grammars for syntax-directed computing (20%) Static analysis of first-order languages (40%) Static analysis of higher-order languages (40%) Lab sessions: (about) 2 hours per week Lab exercises train the theory Organisation: pairwise cooperation Early on in the course more lecture, less lab. 9
Prerequisites Participants are assumed to be familiar with the basic concepts of imperative and functional programming. During the course, we will implement compilers, analyzers and/or interpreters in Haskell. Furthermore, experience with combinator-based parsing is assumed. 10
Course material Slides/handouts: made available on the course website Software: toy compilers, utility libraries, attribute-grammar system, and virtual machines. Reading material: a book and a few papers Exercises and assignments. 11
Further reading: Dragon book Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jeffrey D. Ullman. Compilers. Principles, Techniques, & Tools. Pearson Education, Boston, Massachusetts, 2nd edition, 2007. 12
Further reading: Tiger books Andrew W. Appel. Modern Compiler Implementation in C. Cambridge University Press, Cambridge, 1998. Andrew W. Appel. Modern Compiler Implementation in Java. Cambridge University Press, Cambridge, 1998. Andrew W. Appel. Modern Compiler Implementation in ML. Cambridge University Press, Cambridge, 1998. 13
Further reading: Grune et al. Dick Grune, Henri E. Bal, Ceriel J. H. Jacobs, and Koen G. Langedoen. Modern Compiler Design. John Wiley & Sons, Chichester, 2000. 14
Further reading: Mitchell John C. Mitchell. Foundations for Programming Languages. The MIT Press, Cambridge, Massachusetts, 1996. John C. Mitchell. Concepts in Programming Languages. Cambridge University Press, Cambridge, 2003. 15
Further reading: TAPL Benjamin C. Pierce. Types and Programming Languages. The MIT Press, Cambridge, Massachusetts, 2002. Benjamin C. Pierce, editor. Advanced Topics in Types and Programming Languages. The MIT Press, Cambridge, Massachusetts, 2005. 16
(A Little Bit of) History 17
A-0 system The first electronic computers were programmed in machine language and, later, in assembly language. Compilation was introduced by Grace Hopper in the A-0 system (1952): A-0 programs were subroutines identified by numeric codes. Calls to routines were denoted by juxtaposing the numeric code and call arguments. Today, A-0 would be considered a linker. Eventually led to Flow-Matic, influencing the design of COBOL 18
FORTRAN The first compiler for a higher language was the FORTRAN compiler by John Backus and his team at IBM (1957). Initially, the attitude towards higher languages was sceptical: they were not expected to compete, performancewise, with assembly languages. However, the FORTRAN compiler carried out heavy optimizations, resulting in impressively efficient code. Moreover: a typical FORTRAN program was about 20 times smaller than the corresponding assembly program. 19
1960s and 1970s COBOL was the first language that could be compiled to multiple platforms (1960). In 1962, Timothy Hart and Michael Levin created the Lisp 1.5 compiler, which was the first bootstrapping compiler. During the 1960s and 1970s, the number of proposed programming languages increased rapidly; focus shifted from generation of fast code towards tools and techniques for implementing compilers and interpreters. 20
Compilerbau In 1977, Niklaus Wirth wrote Compilerbau, an influential textbook on compiler construction, in which he presented the stepwise implementation of a compiler for PL/0. Notable features were the use of a recursive descent parser for syntactic analysis, portable P-code as a target of code generation, use of T-diagrams as a means for describing the bootstrapping problem. 21
Since then Recent decades are characterized by the emergence of new programming paradigms (OO, functional programming). These rely on run-time facilities that exceed the capabilities of typical hardware architectures. Challenge for implementors: mapping advanced high-level language concepts onto native machine languages. Advent of wholesale concurrent and distributed programming Gradual typing, JIT, language workbenches, resource awareness, certified compilation, incremental analysis and compilation 22
Current challenges Major challenges include: domain-specific optimisation and error diagnosis programming support for heterogenuous systems (multicore, FPGA, GPU) making dependently typed languages usable 23