Extending xcom. Chapter Overview of xcom

Similar documents
CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR

LECTURE 3. Compiler Phases

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

1. The output of lexical analyser is a) A set of RE b) Syntax Tree c) Set of Tokens d) String Character

4. Lexical and Syntax Analysis

Compiler Design (40-414)

4. Lexical and Syntax Analysis

Lexical and Syntax Analysis

Syntax-Directed Translation

10/5/17. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntax Analysis

Chapter 4. Lexical and Syntax Analysis

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

COMPILER DESIGN. For COMPUTER SCIENCE

Examples of attributes: values of evaluated subtrees, type information, source file coordinates,

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILERS

A Simple Syntax-Directed Translator

Undergraduate Compilers in a Day

RYERSON POLYTECHNIC UNIVERSITY DEPARTMENT OF MATH, PHYSICS, AND COMPUTER SCIENCE CPS 710 FINAL EXAM FALL 96 INSTRUCTIONS

History of Compilers The term

Syntax-Directed Translation. Lecture 14

Crafting a Compiler with C (II) Compiler V. S. Interpreter

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

Using an LALR(1) Parser Generator

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Lexical Analysis. Introduction

About the Authors... iii Introduction... xvii. Chapter 1: System Software... 1

CS 426 Fall Machine Problem 1. Machine Problem 1. CS 426 Compiler Construction Fall Semester 2017

3.5 Practical Issues PRACTICAL ISSUES Error Recovery

UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division

CSE 413 Languages & Implementation. Hal Perkins Winter 2019 Structs, Implementing Languages (credits: Dan Grossman, CSE 341)

Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology

What do Compilers Produce?

COMPILER CONSTRUCTION Seminar 02 TDDB44

11. a b c d e. 12. a b c d e. 13. a b c d e. 14. a b c d e. 15. a b c d e

Compilers. History of Compilers. A compiler allows programmers to ignore the machine-dependent details of programming.

What is a compiler? var a var b mov 3 a mov 4 r1 cmpi a r1 jge l_e mov 2 b jmp l_d l_e: mov 3 b l_d: ;done

COP4020 Programming Languages. Compilers and Interpreters Robert van Engelen & Chris Lacher

Compiling and Interpreting Programming. Overview of Compilers and Interpreters

Downloaded from Page 1. LR Parsing

CPS 506 Comparative Programming Languages. Syntax Specification

Chapter 3. Describing Syntax and Semantics ISBN

CSCI312 Principles of Programming Languages!

A simple syntax-directed

Comp 411 Principles of Programming Languages Lecture 3 Parsing. Corky Cartwright January 11, 2019

Compilers. Prerequisites

CS 406/534 Compiler Construction Putting It All Together

Working of the Compilers

Review of CFGs and Parsing II Bottom-up Parsers. Lecture 5. Review slides 1

Chapter 3. Describing Syntax and Semantics

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

Syntax and Grammars 1 / 21

Compiler Design. Computer Science & Information Technology (CS) Rank under AIR 100

ICOM 4036 Spring 2004

CST-402(T): Language Processors

Interpreters. Prof. Clarkson Fall Today s music: Step by Step by New Kids on the Block

Pioneering Compiler Design

Informal Reference Manual for X

2.2 Syntax Definition

The Compiler So Far. CSC 4181 Compiler Construction. Semantic Analysis. Beyond Syntax. Goals of a Semantic Analyzer.

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

CS 4201 Compilers 2014/2015 Handout: Lab 1

Introduction. Compilers and Interpreters

Why are there so many programming languages? Why do we have programming languages? What is a language for? What makes a language successful?

Sardar Vallabhbhai Patel Institute of Technology (SVIT), Vasad M.C.A. Department COSMOS LECTURE SERIES ( ) (ODD) Code Optimization

The Structure of a Syntax-Directed Compiler

Compilers. Compiler Construction Tutorial The Front-end

Building a Parser Part III

CS /534 Compiler Construction University of Massachusetts Lowell

Types and Static Type Checking (Introducing Micro-Haskell)

Programming Language Specification and Translation. ICOM 4036 Fall Lecture 3

LR Parsing LALR Parser Generators

Question Bank. 10CS63:Compiler Design

Computing Inside The Parser Syntax-Directed Translation. Comp 412 COMP 412 FALL Chapter 4 in EaC2e. source code. IR IR target.

Left to right design 1

Types and Static Type Checking (Introducing Micro-Haskell)

Programming Language Syntax and Analysis

Programming Assignment III

Program Assignment 2 Due date: 10/20 12:30pm

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

Software II: Principles of Programming Languages

A programming language requires two major definitions A simple one pass compiler

Semantic Analysis. Lecture 9. February 7, 2018

Object-oriented Compiler Construction

Abstract Syntax Trees

Bottom-Up Parsing. Lecture 11-12

Lecture 8: Deterministic Bottom-Up Parsing

Introduction to Compiler Design

Lex & Yacc. By H. Altay Güvenir. A compiler or an interpreter performs its task in 3 stages:

What is a compiler? Xiaokang Qiu Purdue University. August 21, 2017 ECE 573

CS 406: Syntax Directed Translation

6.184 Lecture 4. Interpretation. Tweaked by Ben Vandiver Compiled by Mike Phillips Original material by Eric Grimson

The Structure of a Syntax-Directed Compiler

Lex & Yacc. by H. Altay Güvenir. A compiler or an interpreter performs its task in 3 stages:

Formats of Translated Programs

Chapter 2 A Quick Tour

CS131 Compilers: Programming Assignment 2 Due Tuesday, April 4, 2017 at 11:59pm

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

CS 321 IV. Overview of Compilation

Transcription:

Chapter 3 Extending xcom 3.1 Overview of xcom xcom is compile-and-go; it has a front-end which analyzes the user s program, a back-end which synthesizes an executable, and a runtime that supports execution. The front end has a grammar module, a scanner, a parser and a symbol table builder. The back end has a tree walker, code generators, an assembler and a manager for register spills. The runtime holds c++ code that can be called from generated machine code. Most of the compiler modules are c++ classes. The design emphasizes conventional algorithms and ease of change over more traditional objectives such as speed or code quality. There are two independent global variants to xcom. The first is the target platform. Over the years this has included the ibm, digital, intel and motorola computers. The second is the parsing method, either top-down or bottom-up. Some variants are implemented in the xcom distribution. The variant code itself resides in appropriately named subdirectories. Thus the overall structure of the xcom directory is: /xcom main directory /fe analysis (front end) /cfg grammar processing /scan lex and scan /parse parse and ast building /recur top down /lalr bottom up /sym symbol table /be synthesis (back end) /gen code generation /x86 Intel x86 /ppc Motorola power pc /a64 Intel 64 bit 59

60 CHAPTER 3. EXTENDING XCOM /asm assembler/disassembler /x86 Intel x86 /ppc Motorola power pc /a64 Intel 64 bit /mem memory pool /tex pretty printing /rt runtine routines /bin xcom binaries /x X programs Testing One can test xcom one module at a time (unit tests), or as a whole. Each unit test is run by a standalone driver. Each driver has some output that illustrates some of the function supplied by the module. The drivers are a good place to exercise new code and place new tests. There are several testing targets in the Makefile. Some of them check that error-free X programs get the expected answer. Others check that mistakes get the expected diagnostic. There are also many assertions insuring xcom invariants are maintained. Each major module (e.g., asm.cpp) has a #define/#undef pair that controls tracing-on-the-fly. Interchanging the lines toggles the tracing behavior. The trace output is to stdout; it is useful when one is deep in the details of a module. Typically the (verbose) traces are turned on only during unit testing. Each major module also has a dumper which presents a readable form of the intermediate information. A dump is not available until the module completes its task. Dumping uses callbacks, passing information to the outer layer of the compiler for presentation. Dumping is requested with xcom command line flags. Diagnostics are implemented with exceptions and callbacks, leaving it to the outer layer to decide how to present the comeupance. Runtime errors (e.g. divide by zero) are reported from the assembly code via conventional return codes. A non-zero return code will cause an xcom error to be thrown following the return. Plug and Play Components Technology changes all the time. Today s best is surely going to be replaced in the near future. The cpu-dependent files are in subdirectories named for the cpu (e.g. /x86). Early ancestors of this compiler ran on ibm s/360, digital vax and alpha. More recently it has run on intel x86, power pc and intel 64-bit cpus. The Makefile detects the underlying hardware and chooses the corresponding subdirectories. The choice is between top-down and bottom-up parsing is up to the user. A single module, lang.cpp, containing the source-language dependencies, has two forms. If the bottom-up (like yacc) form is chosen, LALR(1) tables are built and used. Otherwise recursive methods do the parsing. A make variable must be set to select either the parse/recur or parse/lalr subdirectories.

3.2. CHANGING X SYNTAX 61 3.2 Changing x Syntax If one is going to extend x, it is usual to start with the grammar xcom/fe/cfg/x.cfg. If something is added to X.cfg, then new symbols will be made available for use. Do not expect too much from the grammar. The grammar already allows some nonsense. For example x + y is not implemented for logical values. Extending the grammar will almost always add some more nonsense in addition to the desired constructs. The undesired constructs must be screened out (cause diagnostics) later in the process, usually when they fail to produce type-correct expressions. If a bottom-up parser is to be built, the make process may halt in directory /cfg with diagnostics from the LALR(1) table builder. The table builder must be accomodated via grammar changes to make further progress. If a top-down parser is to be built, the same kind of effort will be expended on getting the recursive routines in lang.cpp to work. Adding Vector Syntax Suppose one wished to add vectors to x. The process starts with adding the syntax for a[i]. A new rule has to be added: var id id [ expression ] and the uses of id in factor and vars replaced with var. If one then builds xcom, new tables will be extracted from X.cfg and used in analysis and synthesis. The new terminal symbols [ and ] will be given names lsqrsym and rsqrsym by cfg.awk. You will want to propogate these changes into the lexer, parser, and all the downstream parts of xcom. You can see the tabulated grammar data in X.macros. Adding Set Syntax Suppose, on the other hand, one wish to add sets to xcom. This is somewhat more complicated than adding subscripts. One might add additional rules to factor: factor { exprs } { } size factor choice factor

62 CHAPTER 3. EXTENDING XCOM and, assuming that the relations (e.g. <, <=) will be interpreted as the analogous set operations (e.g., include, include-or-equal), + as set union, * as set conjunction, one adds just one new relation. sum in sum As before, new nonterminals (in this case lcurlsym and rcurlsym) will become available for use throughout the rest of xcom. And, as before, some syntactically allowed texts (such as set division) will be nonsense. Adding String Syntax Adding a new literal (e.g. c-style string) requires editing the table generator as well as the grammar. One first adds a name string into the X.cfg rule for factor analogous to integer and real. If nothing else is done, string will be taken as a reserved word. One needs, in addition, to add string to the list in function isnotreserved in lextables.cpp. This change keeps string out of the reserved word list, and enables an extension of the lexer to handle string literals. One needs some string operators. One can use + for catenate, as is done in Java, and the relationals for comparisons; these are already in the grammar. Function such as substr and strlen can be added in analogy to i2r and r2i. Type as Syntax One can elaborate X.cfg to make type-correctness a syntatic issue. For example, there would be separate rules for em intsum + intterm em realsum + realterm One consequence is adding about 30 rules to X.cfg. Another is to force parsing decisions to depend on the knowledge of type. There is such a grammar (Xtype.cfg) in the xcom distribution. It is there as an example of something to avoid, rather than something to emulate. Unit Test The make target cfgtest will process X.cfg and print the information extracted from it. 3.3 Changing Lex and Scan The lexer is based on the assumption that the entire source text of an x file is read and placed in contiguous memory. The lexer deals directly with memory image of the source text. Anytime a new symbol is added, the lexer potentially needs to analyze a new case. Some situations, such as adding single-character symbols, operator identifiers, and

3.4. CHANGING PARSE 63 reserved words, are handled automatically by the grammar processing module. Such input symbols will be given new standard names and passed on to the rest of the compiler without further work by the developer. Input code for new literals, such as strings, needs to be added. There are three parts to a token: its starting address in the input text, its length, and a code identifying the kind of token. The lexer fills in a c struct and reports its find via a pure virtual function. The scanner implements the virtual function and queues up or discards the result. The scanner typically does not have to be changed. It is a navigation layer on top of the stream of lexical information. Unit Test The program scandriver.cpp contains a set of tests exercising the lexer. It is a good place to add new tests for extensions. Dump The flag -lex on the xcom command line will cause xcom to dump the sequence of lexemes in a readable format after processing the whole input file. Trace define/undef macros at the head of scan.cpp can be interchanged and the scanner rebuilt to give a verbose on-the-fly presentation of scanning. 3.4 Changing Parse The parse machinery is defined in parse.cpp. It rarely needs changing. The parser itself resides in lang.cpp. Its entry, method parse implements a pure virtual function defined in parse.cpp. The rest of the Lang object is private; it reports the shift/reduce sequence to the abstract tree builder via functions of that name defined in the parse machinery. There are two versions of the x-specific code. One, in subdirectory /recur is recursive descent. The other, in subdirectory /lalr is bottom-up, driven by LALR(1) tables computed in the grammar module. In /recur, recursive functions must be added for each new phrase name added to the x grammar. The technique for removing left recursion is discussed in the analysis chapter. Function shift is called each time a symbol is recognized and used. Function reduce is called each time the rhs of a grammar is rule is completed. In /lalr the shift/reduce sequence is automatically produced if the grammar was, in fact, LALR(1). If it was not, the build would have broken in the grammar stage.

64 CHAPTER 3. EXTENDING XCOM Unit Test The program parsedriver.cpp contains a set of tests exercising the parser. It is a good place to add new tests for extensions. Trace define/undef macros at the head of lang.cpp can be interchanged and the parser rebuilt to give a verbose on-the-fly presentation of the recursive routines at work. 3.5 Changing Ast Builder tree representation node building unit test ast dump 3.6 Changing the Symbol Table It is reasonable to define vectors to always be big enough. That is, if a[i] is assigned and there is no i-th member, then the array is quietly extended out to position i. Trying to use the value of a non-existent a[i] would be reported as a runtime error. symbol representation adding type adding shape symbol table representation unit test symbol table dump 3.7 Changing Ast Walkers postorder walk values in walker locals opaque locals 3.8 Changing Code Generation new primitives calling runtime unit test asm dump

3.9. PRETTY PRINTING 65 dis dump 3.9 Pretty Printing The overloading of other ascii characters for multiple uses is annoying, both to the author of the extension and to the eventual users of the language. The text A + B does not immediately bring to mind set union. If the user program were to be typeset, the symbol could be used, and so on for the rest of the mathematical operators in the set extension. What is even more annoying is the obvious truth that by the end of compilation xcom has made this distinction. It knows. So provision is made for xcom to do the typesetting as a byproduct of compilation. Preceding a file name in the xcom command line with the flag -tex will cause xcom to dump out latex typesetting instructions. The xcom distribution renders x according to the choices Dijkstra made in his book A Discipline of Programming. The typesetting code can be extended, using the information in the symbol table, and the aesthetics of the language designer, to provide very pretty printing. 3.10 Machine Language Instructions, Registers and Faults 3.11 Arithmetic Expressions Sequencing Int Register Manager Left-to-right Float Constraint 3.12 Flow of Control Branches Fixups 3.13 Subprograms 3.14 Runtime