John Aycock. Department of Computer Science. 2 Model of a Compiler. the subject, such as [1] and [2].

Similar documents
Compiler Optimisation

The Design and Implementation of SPARK, a Toolkit for Implementing Domain-Specific Languages

Coupling the User Interfaces of a Multiuser Program

CS 106 Winter 2016 Craig S. Kaplan. Module 01 Processing Recap. Topics

Preamble. Singly linked lists. Collaboration policy and academic integrity. Getting help

Computer Organization

Online Appendix to: Generalizing Database Forensics

Lab work #8. Congestion control

Compiler Design (40-414)

Learning Polynomial Functions. by Feature Construction

Random Clustering for Multiple Sampling Units to Speed Up Run-time Sample Generation

Politecnico di Torino. Porto Institutional Repository

Generalized Edge Coloring for Channel Assignment in Wireless Networks

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Positions, Iterators, Tree Structures and Tree Traversals. Readings - Chapter 7.3, 7.4 and 8

CS Compiler Construction West Virginia fall semester 2014 August 18, 2014 syllabus 1.0

Yet Another Parallel Hypothesis Search for Inverse Entailment Hiroyuki Nishiyama and Hayato Ohwada Faculty of Sci. and Tech. Tokyo University of Scien

Politehnica University of Timisoara Mobile Computing, Sensors Network and Embedded Systems Laboratory. Testing Techniques

Comparison of Methods for Increasing the Performance of a DUA Computation

Divide-and-Conquer Algorithms

William S. Law. Erik K. Antonsson. Engineering Design Research Laboratory. California Institute of Technology. Abstract

Message Transport With The User Datagram Protocol

Generalized Edge Coloring for Channel Assignment in Wireless Networks

Skyline Community Search in Multi-valued Networks

Intensive Hypercube Communication: Prearranged Communication in Link-Bound Machines 1 2

Recitation Caches and Blocking. 4 March 2019

Compiling and Interpreting Programming. Overview of Compilers and Interpreters

APPLYING GENETIC ALGORITHM IN QUERY IMPROVEMENT PROBLEM. Abdelmgeid A. Aly

Frequent Pattern Mining. Frequent Item Set Mining. Overview. Frequent Item Set Mining: Motivation. Frequent Pattern Mining comprises

Working of the Compilers

Adjacency Matrix Based Full-Text Indexing Models

Problem Paper Atoms Tree. atoms.pas. atoms.cpp. atoms.c. atoms.java. Time limit per test 1 second 1 second 2 seconds. Number of tests

Non-homogeneous Generalization in Privacy Preserving Data Publishing

ACE: And/Or-parallel Copying-based Execution of Logic Programs

Indexing the Edges A simple and yet efficient approach to high-dimensional indexing

Almost Disjunct Codes in Large Scale Multihop Wireless Network Media Access Control

d 3 d 4 d d d d d d d d d d d 1 d d d d d d

CST-402(T): Language Processors

Inuence of Cross-Interferences on Blocked Loops: to know the precise gain brought by blocking. It is even dicult to determine for which problem

On the Role of Multiply Sectioned Bayesian Networks to Cooperative Multiagent Systems

Shift-map Image Registration

Lecture 1 September 4, 2013

Semantic Analysis. Lecture 9. February 7, 2018

From an Abstract Object-Oriented Model to a Ready-to-Use Embedded System Controller

Verifying performance-based design objectives using assemblybased vulnerability

An Algorithm for Building an Enterprise Network Topology Using Widespread Data Sources

Chapter 2 A Quick Tour

Shift-map Image Registration

Design Management Using Dynamically Defined Flows

Backpressure-based Packet-by-Packet Adaptive Routing in Communication Networks

Topics. Computer Networks and Internets -- Module 5 2 Spring, Copyright All rights reserved.

Using Vector and Raster-Based Techniques in Categorical Map Generalization

Loop Scheduling and Partitions for Hiding Memory Latencies

PERFECT ONE-ERROR-CORRECTING CODES ON ITERATED COMPLETE GRAPHS: ENCODING AND DECODING FOR THE SF LABELING

Figure 1: Schematic of an SEM [source: ]

PART 2. Organization Of An Operating System

CS269I: Incentives in Computer Science Lecture #8: Incentives in BGP Routing

Local Path Planning with Proximity Sensing for Robot Arm Manipulators. 1. Introduction

A Multi-class SVM Classifier Utilizing Binary Decision Tree

Using Ray Tracing for Site-Specific Indoor Radio Signal Strength Analysis 1

6.823 Computer System Architecture. Problem Set #3 Spring 2002

Distributed Line Graphs: A Universal Technique for Designing DHTs Based on Arbitrary Regular Graphs

Transient analysis of wave propagation in 3D soil by using the scaled boundary finite element method

How to Make E-cash with Non-Repudiation and Anonymity

Variable Independence and Resolution Paths for Quantified Boolean Formulas

Compiler Design. Computer Science & Information Technology (CS) Rank under AIR 100

Interior Permanent Magnet Synchronous Motor (IPMSM) Adaptive Genetic Parameter Estimation

Bends, Jogs, And Wiggles for Railroad Tracks and Vehicle Guide Ways

A Plane Tracker for AEC-automation Applications

Compilers and Code Optimization EDOARDO FUSELLA

CS131: Programming Languages and Compilers. Spring 2017

Supporting Fully Adaptive Routing in InfiniBand Networks

Solution Representation for Job Shop Scheduling Problems in Ant Colony Optimisation

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILERS

Experion PKS R500 Migration Planning Guide

SURVIVABLE IP OVER WDM: GUARANTEEEING MINIMUM NETWORK BANDWIDTH

9/5/17. The Design and Implementation of Programming Languages. Compilation. Interpretation. Compilation vs. Interpretation. Hybrid Implementation

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

Pioneering Compiler Design

Considering bounds for approximation of 2 M to 3 N

Impact of cache interferences on usual numerical dense loop. nests. O. Temam C. Fricker W. Jalby. University of Leiden INRIA University of Versailles

COMP 181 Compilers. Administrative. Last time. Prelude. Compilation strategy. Translation strategy. Lecture 2 Overview

Why are there so many programming languages? Why do we have programming languages? What is a language for? What makes a language successful?

PART 5. Process Coordination And Synchronization

Software Reliability Modeling and Cost Estimation Incorporating Testing-Effort and Efficiency

Change Patterns and Change Support Features in Process-Aware Information Systems

Algorithm for Intermodal Optimal Multidestination Tour with Dynamic Travel Times

Introduction to Compiler Design

Enabling Rollback Support in IT Change Management Systems

Backpressure-based Packet-by-Packet Adaptive Routing in Communication Networks

MODULE II. Network Programming And Applications

Animated Surface Pasting

+ E. Bit-Alignment for Retargetable Code Generators * 1 Introduction A D T A T A A T D A A. Keen Schoofs Gert Goossens Hugo De Mant

INCORPORATING ADVANCED PROGRAMMING TECHNIQUES IN THE COMPUTER INFORMATION SYSTEMS CURRICULUM

Parsing II Top-down parsing. Comp 412

Finite Automata Implementations Considering CPU Cache J. Holub

LECTURE NOTES ON COMPILER DESIGN P a g e 2

How to Link your Accounts at Morgan Stanley

MODULE V. Internetworking: Concepts, Addressing, Architecture, Protocols, Datagram Processing, Transport-Layer Protocols, And End-To-End Services

Transcription:

Compiling Little Languages in Python John Aycock Department of Computer Science University of Victoria Victoria, B.C., Canaa aycock@csc.uvic.ca Abstract \Little languages" such as conguration les or HTML ocuments are commonplace in computing. This paper ivies the work of implementing a little language into four parts, an presents a framework which can be use to easily conquer the implementation of each. The pieces of the framework have the unusual property that they may be extene through normal object-oriente means, allowing features to be ae to a little language simply by subclassing parts of its compiler. 1 Introuction Domain-specic languages, or \little languages," are frequently encountere when ealing with computers [3]. Conguration les, HTML ocuments, shell scripts all are little structure languages, yet may lack the generality an features of full-blown programming languages. Whether writing an interpreter for a little language, or compiling a little language into another language, compiler techniques can be use. In many cases, an extremely fast compiler is not neee, especially if the input programs ten to be small. Instea, issues can preominate such as compiler evelopment time, maintainability of the compiler, an the ability to easily a new language features. This is Python's strong suit. This paper escribes some successful techniques I evelope while working on two compilers for little languages: one for a subset of Java, the other an optimizing compiler for Guie, a CGI-programming language [12]. The net result is a framework which can be use to implement little languages easily. Scanning Parsing Semantic Analysis Coe Generation Token List AST AST Figure 1: Compiler moel. 2 Moel of a Compiler Like most nontrivial pieces of software, compilers are generally broken own into more manageable moules, or phases. The esign issues involve an the etails of each phase are too numerous to iscuss here in epth; there are many excellent books on the subject, such as [1] an [2]. I began with a simple moel of a compiler having only four phases, as shown in Figure 1: 1. Scanning, or lexical analysis. Breaks the input stream into a list of tokens. For example, the expression \2 + 3 * 5" can be broken up into ve tokens: number plus number times number. The values 2, 3, an 5 are attributes associate with the corresponing number token. 2. Parsing, or syntax analysis. Ensures that a list of tokens has vali syntax accoring to a gram-

number (2) + number (3) * number (5) Figure 2: Abstract syntax tree (AST). mar a set of rules that escribes the syntax of the language. For the above example, a typical expression grammar woul be: expr ::= expr + term expr ::= term term ::= term * factor term ::= factor factor ::= number In English, this grammar's rules say that an expression can be an expression plus a term, an expression may be a term by itself, an so on. Intuitively, the symbol on the left-han sie of \::=" may be thought of as a variable for which the symbols on the right-han sie may be substitute. Symbols that on't appear on any left-han sie like +,*,an number correspon to the tokens from the scanner. The result of my parsing is an abstract syntax tree (AST), which represents the input program. For \2 + 3 * 5," the AST woul look like the one in Figure 2. 3. Semantic analysis. Traverses the AST one or more times, collecting information an checking that the input program ha no semantic errors. In a typical programming language, this phase woul etect things like type conicts, reene ientiers, mismatche function parameters, an numerous other errors. The information gathere may be store in a global symbol table, or attache as attributes to the noes of the AST itself. 4. Coe generation. Again traversing the AST, this phase may irectly interpret the program, or output coe in C or assembly which woul implement the input program. For expressions as simple as those in the example, they coul be evaluate on the y in this phase. Each phase performs a well-ene task, an passes a ata structure on to the next phase. Note that information only ows one way, an that each phase runs to completion before the next one starts 1. This is in contrast to oft-use techniques which have a symbiosis between scanning an parsing, where not only may several phases be working concurrently, but a later phase may sen some feeback to moify the operation of an earlier phase. Certainly all little language compilers won't t this moel, but it is extremely clean an elegant for those that o. The main function of the compiler, for instance, istills into three lines which reect the compiler's structure: f = open(filename) generate(semantic(parse(scan(f)))) f.close() In the remainer of this paper, I will examine each of the above four phases, showing how my framework can be use to implement the little expression language above. Following this will be a iscussion of some of the inner workings of the framework's classes. 3 The Framework A common theme throughout this framework is that the user shoul have to o as little work as possible. For each phase, my framework supplies a class which performs most of the work. The user's job is simply to create subclasses which customize the framework. 3.1 Lexical Analysis Lexical analyzers, or scanners, are typically implemente one of two ways. The rst way is to write the scanner by han; this may still be the metho of choice for very small languages, or where use of a tool to generate scanners automatically is not possible. The secon metho is to use a scanner generator tool, like lex [11], which takes a high-level escription of the permitte tokens, an prouces a nite state machine which implements the scanner. Finite state machines are equivalent to regular expressions; in fact, one uses regular expressions to 1 There is some anecotal evience that parts of prouction compilers may be moving towars a similar moel [18].

specify tokens to scanner generators! Since Python has regular expression support, it is natural to use them to specify tokens. (As a case in point, the Python moule \tokenize" has regular expressions to tokenize Python programs.) So GenericScanner, my generic scanner class, requires a user to create a subclass of it in which they specify the regular expressions that the scanner shoul look for. Furthermore, an \action" consisting of arbitrary Python coe can be associate with each regular expression this is typical of scanner generators, an allows work to be performe base on the type of token foun. Below is a simple scanner to tokenize expressions. The parameter to the action routines is a string containing the part of the input that was matche by the regular expression. class SimpleScanner(GenericScanner): ef init (self): GenericScanner. init (self) ef tokenize(self, input): self.rv = [] GenericScanner.tokenize(self, input) return self.rv ef t_whitespace(self, s): r' \s+ ' pass ef t_op(self, s): r' \+ \* ' self.rv.appen(token(type=s)) ef t_number(self, s): r' \+ ' t = Token(type='number', attr=s) self.rv.appen(t) Each metho whose name begins with \t "isan action; the regular expression for the action is place in the metho's ocumentation string. (The reason for this unusual esign is explaine in Section 4.1.) When the tokenize metho is calle, a list of Token instances is returne, one for each operator an number foun. The coe for the Token class is omitte; it is a simple container class with a type an an optional attribute. White space is skippe by SimpleScanner, since its action coe oes nothing. Any unrecognize characters in the input are matche by a efault pattern, eclare in the action GenericScanner.t efault. This efault metho can of course be overrien in a subclass. A trace of SimpleScanner on the input \2 + 3 * 5" is shown in Table 1. Input Metho Token Ae 2 t number number (attribute 2) space t whitespace + t op + space t whitespace 3 t number number (attribute 3) space t whitespace * t op * space t whitespace 5 t number number (attribute 5) Table 1: Trace of SimpleScanner. Scanners mae with GenericScanner are extensible, meaning that new tokens may be recognize simply by subclassing. To exten SimpleScanner to recognize oating-point number tokens is easy: class FloatScanner(SimpleScanner): ef init (self): SimpleScanner. init (self) ef t_float(self, s): r' \+ \. \+ ' t = Token(type='float', attr=s) self.rv.appen(t) How are these classes use? Typically, all that is neee is to rea in the input program, an pass it to an instance of the scanner: ef scan(f): input = f.rea() scanner = FloatScanner() return scanner.tokenize(input) Once the scanner is one, its result is sent to the parser for syntax analysis. 3.2 Syntax Analysis The outwar appearance of GenericParser, my generic parser class, is similar to that of Generic- Scanner. A user starts by creating a subclass of Generic- Parser, containing special methos which are name with the prex \p ". These special methos encoe grammar rules in their ocumentation strings; the coe in the methos are actions which get execute when one of the associate grammar rules are recognize by GenericParser.

The expression parser subclass is shown below. Here, the actions are builing the AST for the input program. AST is also a simple container class; each instance of AST correspons to a noe in the tree, with a noe type an possibly chil noes. class ExprParser(GenericParser): ef init (self, start='expr'): GenericParser. init (self, start) Metho Calle AST After Call ef p_expr_1(self, args): ' expr ::= expr + term ' return AST(type=args[1], left=args[0], right=args[2]) ef p_expr_2(self, args): ' expr ::= term ' return args[0] ef p_term_1(self, args): ' term ::= term * factor ' return AST(type=args[1], left=args[0], right=args[2]) ef p_term_2(self, args): ' term ::= factor ' return args[0] ef p_factor_1(self, args): ' factor ::= number ' return AST(type=args[0]) p factor 1 p factor 1 p term 2 p term 1 p factor 1 A A ef p_factor_2(self, args): ' factor ::= float ' return AST(type=args[0]) p term 2 A The grammar's start symbol is passe to the constructor. ExprParser buils the AST from the bottom up. Figure 3 shows the AST in Figure 2 being built, an the sequence in which ExprParser's methos are invoke. The \args" passe in to the actions are base on a similar iea use by yacc [11], a prevalent parser generator tool. Each symbol on a rule's right-han sie has an attribute associate with it. For token symbols like +, this attribute is the token itself. All other symbols' attributes come from the return values of actions which, in the above coe, means that they are subtrees of the AST. The inex into args comes from the position of the symbol in the rule's right-han sie. In the running example, the call p expr 2 p expr 1 A A A Figure 3: AST construction.

to p expr 1 has len(args) == 3: args[0] is expr's attribute, the left subtree of + in the AST; args[1] is +'s attribute, the token +; args[2] is term's attribute, the right subtree of + in the AST. The routine to use this subclass is straightforwar: ef parse(tokens): parser = ExprParser() return parser.parse(tokens) Although omitte for brevity, ExprParser can be subclasse to a grammar rules an actions, the same way the scanner was subclasse. After syntax analysis, the parser has prouce an AST, an verie that the input program aheres to the grammar rules. Next, the input's meaning must be checke by the semantic analyzer. 3.3 Semantic Analysis Semantic analysis is performe by traversing the AST. Rather than sprea coe to traverse an AST all over the compiler, I have a single base class, AST- Traversal, which knows how to walk the tree. Subclasses of ASTTraversal supply methos which get calle epening on what type of noe is encountere. To etermine which metho to invoke, AST- Traversal will rst look for a metho with the same name as the noe type (plus the prex \n "), then will fall back on an optional efault metho if no more specic metho is foun. Of course, ASTTraversal can supply many ierent traversal algorithms. I have foun three useful: preorer, postorer, an a pre/postorer combination. (The latter allows methos to be calle both on entry to, an exit from, a noe.) For example, say that we want to forbi the mixing of oating-point an integer numbers in our expressions: class TypeCheck(ASTTraversal): ef init (self, ast): ASTTraversal. init (self, ast) self.postorer() ef n_number(self, noe): noe.exprtype = 'number' ef n_float(self, noe): noe.exprtype = 'float' ef efault(self, noe): # this hanles + an * noes lefttype = noe.left.exprtype righttype = noe.right.exprtype if lefttype!= righttype: raise 'Type error.' noe.exprtype = lefttype I foun the semantic checking coe easier to write an unerstan by taking the (amittely less ecient) approach of making multiple traversals of the AST each pass performs a single task. TypeCheck is invoke from a small glue routine: ef semantic(ast): TypeCheck(ast) # # Any other ASTTraversal classes # for semantic checking woul be # instantiate here... # return ast After this phase, I have an AST for an input program that is lexically, syntactically, an semantically correct but that oes nothing. The nal phase, coe generation, remeies this. 3.4 Coe Generation The term \coe generation" is somewhat of a misnomer. As alreay mentione, this phase will traverse the AST an implement the input program, either irectly through interpretation, or inirectly by emitting some coe. Our expressions, for instance, can be easily interprete: class Interpret(ASTTraversal): ef init (self, ast): ASTTraversal. init (self, ast) self.postorer() print ast.value ef n_number(self, noe): noe.value = int(noe.attr) ef n_float(self, noe): noe.value = float(noe.attr) ef efault(self, noe): left = noe.left.value right = noe.right.value if noe.type == '+': noe.value = left + right else: noe.value = left * right

In contrast, my two compilers use an ASTTraversal to output an intermeiate representation (IR) which is eectively a machine-inepenent assembly language. This IR then gets converte into MIPS assembly coe in one compiler, C++ coe in the other. I am consiering ways to incorporate more sophisticate coe generation methos into this framework, such as tree pattern matching with ynamic programming [8]. 4 Inner Workings 4.1 Reection Extensibility presents some interesting esign challenges. The generic classes in the framework, without any moications mae to them, must be able to ivine all the information an actions containe in their subclasses, subclasses that in't exist when the generic classes were create. Fortunately, an elegant mechanism exists in Python to o just this: reection. Reection refers to the ability of a Python program to query an moify itself at run time (this feature is also present in other languages, like Java an Smalltalk). Consier, for example, my generic scanner class. GenericScanner searches itself an its subclasses at run time for methos that begin with the prex \t." These methos are the scanner's actions. The regular expression associate with the actions is specie using a well-known metho attribute that can be querie at run time the metho's ocumentation string. This wanton abuse of ocumentation strings can be rationalize. Documentation strings are a metho of associating meta-information comments with a section of coe. My framework is an extension of that iea. Instea of comments intene for humans, however, I have metainformation intene for use by my framework. As the number of reective Python applications grows, it may beworthwhile to a more formal mechanisms to Python to support this task. 4.2 GenericScanner Internally, GenericScanner works by constructing a single regular expression which is compose of all the smaller regular expressions it has foun in the action methos' ocumentation strings. Each component regular expression is mappe to its action using Python's symbolic group facility. Unfortunately, there is a small snag. Python follows the Perl semantics for regular expressions rather than the POSIX semantics, which means it follows the \rst then longest" rule the leftmost part of a regular expression that matches is always taken, rather than using the longest match. In the above example, if GenericScanner were to orer the regular expression so that \\+" appeare before \\+\.\+", then the input 123.45 woul match as the number 123, rather than the oating-pointnumber 123.45. To work aroun this, GenericScanner makes two guarantees: 1. A subclass' patterns will be matche before any in its parent classes. 2. The efault pattern for a subclass, if any, will be matche only after all other patterns in the subclass have been trie. One obvious change to GenericScanner is to automate the builing of the list of tokens each \t" metho coul return a list of tokens which woul be appene to the scanner's list of tokens. The reason this is not one is because it woul limit potential applications of GenericScanner. For example, in one compiler I use a subclass of GenericScanner as a preprocessor which returne a string; another scanner class then broke that string into a list of tokens. 4.3 GenericParser GenericParser is actually more powerful than was allue to in Section 3.2. At the cost of greater coupling between methos, actions for similar rules may be combine together rather than having to uplicate coe my original version of ExprParser is shown below. class ExprParser(GenericParser): ef init (self, start='expr'): GenericParser. init (self, start) ef p_expr_term(self, args): expr ::= expr + term term ::= term * factor return AST(type=args[1], left=args[0], right=args[2]) ef p_expr_term_2(self, args):

expr ::= term term ::= factor return args[0] ef p_factor(self, args): factor ::= number factor ::= float return AST(type=args[0]) Taking this to extremes, if a user is only intereste in parsing an oesn't require an AST, ExprParser coul be written: class ExprParser(GenericParser): ef init (self, start='expr'): GenericParser. init (self, start) ef p_rules(self, args): expr ::= expr + term expr ::= term term ::= term * factor term ::= factor factor ::= number factor ::= float In theory, GenericParser coul use any parsing algorithm for its engine. However, I chose the Earley parsing algorithm [6] which has several nice properties for this application [10]: 1. It is one of the most general algorithms known; it can parse all context-free grammars whereas the more popular LL an LR techniques cannot. This is important for easy extensibility; a user shoul ieally be able to subclass a parser without worrying about properties of the resulting grammar. 2. It generates all its information at run-time, rather than having to precompute sets an tables. Since the grammar rules aren't known until run-time, this is just as well! Unlike most other parsing algorithms, Earley's metho parses ambiguous grammars. Currently, ambiguity presents a problem since it is not clear which actions shoul be invoke. Future versions of GenericParser will have anambiguity-resolution scheme to aress this. To accommoate a variety of possible parsing algorithms (incluing the one I use), GenericParser only makes one guarantee with respect to when the rules' actions are execute. A rule's action is execute only after all the attributes on the rule's righthan sie are fully compute. This conition is suf- cient to allow the correct construction of ASTs. 4.4 ASTTraversal ASTTraversal is the least unusual of the generic classes. It coul be argue that its use of reection is superuous, an the same functionality coul be achieve by having its subclasses provie a metho for every type of AST noe; these methos coul call a efault metho themselves if necessary. The problems with this non-reective approach are threefol. First, it introuces a maintenance issue: any aitional noe types ae to the AST require all ASTTraversal's subclasses to be change. Secon, it forces the user to o more work, as methos for all noe types must be supplie; my experience, especially for semantic checking, is that only a small set of noe types will be of interest for a given subclass. Thir, some noe types may not map nicely into Python metho names I prefer to use noe types that reect the little language's syntax, like +, an it isn't possible to have methos name \n +" 2. This latter point is where it is useful to have ASTTraversal reectively probe a subclass an automatically invoke the efault metho. 4.5 Design Patterns Although evelope inepenently, the use of reection in my framework is arguably a specialization of the Reection pattern [4]. I speculate that there are many other esign patterns where reection can be exploite. To illustrate, ASTTraversal woun up somewhere between the Default Visitor [13] an Re- ection patterns, although it was originally inspire by the Visitor pattern [9]. Two other esign patterns can be applie to my framework too. First, the entire framework coul be organize explicitly as a Pipes an Filters pattern [4]. Secon, the generic classes coul support interchangeable algorithms via the Strategy pattern [9]; parsing algorithms, in particular, vary wiely in their characteristics, so allowing ierent algorithms coul be a boon to an avance user. 2 Not irectly, anyway...

5 Comparison to Other Work The basis of this paper is the observation that little languages, an the nee to implement them, are recurring problems. Not all authors even agree on this point Shivers [16] presents an alternative to little languages an a Scheme-base implementation framework. Tcl was also evelope to aress the proliferation of little languages [14]. Other Python packages exist to automate parts of scanning an parsing. PyLR [5] uses a parsing engine written in C to accelerate parsing; kwparsing [17] automates the implementation of common programming language features at the cost of a more complex API. Both require precomputation of scanning an parsing information. YAPPS [15] uses the weakest parsing algorithm of the surveye packages, an its author notes `It is not fast, powerful, or particularly exible.' There are occasional references to the PyBison package, which I was unable to locate. For completeness, the mcf.pars package [7] is an interesting nontraitional system base on generalize pattern matching, but is suciently ierent from my framework to preclue any meaningful comparisons. 6 Not-so-Little Languages This paper has presente a framework I have evelope to buil compilers in Python. It uses reection an esign patterns to prouce compilers which can be easily extene using traitional object-oriente methos. At present, this framework has prove its eectiveness in the implementation of two \little languages." I plan to further test this framework by using it to buil a compiler for a larger language Python. Availability The source coe for the framework an the example use in this paper is available at http://www.csc.uvic.ca/~aycock. Acknowlegments Iwoul like to thank Nigel Horspool an Shannon Jaeger for their comments on early rafts of this paper. Mike Zastre mae several suggestions which improve Figure 3, an the anonymous referees supplie valuable feeback. References [1] A. V. Aho, R. Sethi, an J. D. Ullman. Compilers: Principles, Techniques, an Tools. Aison-Wesley, 1986. [2] A. W. Appel. Moern Compiler Implementation in Java. Cambrige, 1998. [3] J. Bentley. Little Languages. In More Programming Pearls, pages 83{100. Aison-Wesley, 1988. [4] F. Buschmann, R. Meunier, H. Rohnert, P. Sommerla, an M. Stal. Pattern-Oriente Software Architecture: A System of Patterns. Wiley, 1996. [5] S. Cotton. PyLR. http://starship.skyport.net/crew/scott/pylr.html. [6] J. Earley. An Ecient Context-Free Parsing Algorithm. Communications of the ACM, 13(2):94{102, 1970. [7] M. C. Fletcher. mcf.pars. http://www.golen.net/~mcfletch/programming/. [8] C. W. Fraser, D. R. Hanson, an T. A. Proebsting. Engineering a Simple, Ecient Coe Generator Generator. ACM LOPLAS. 1(3):213{ 226, 1992. [9] E. Gamma, R. Helm, R. Johnson, an J. Vlissies. Design Patterns. Aison-Wesley, 1995. [10] D. Grune an C. J. H. Jacobs. Parsing Techniques: A Practical Guie. Ellis Horwoo, 1990. [11] J. R. Levine, T. Mason, an D. Brown. lex & yacc. O'Reilly, 1992. [12] M. R. Levy. Web Programming in Guie. Software, Practice an Experience. Accepte for publication July 1998. [13] M. E. Norberg III. Variations on the Visitor Pattern. PLoP '96 Writer's Workshop. [14] J. K. Ousterhout. Tcl an the Tk Toolkit. Aison-Wesley, 1994. [15] A. J. Patel. YAPPS. http://theory.stanfor.eu/~amitp/yapps/. [16] O. Shivers. A Universal Scripting Framework. In Concurrency an Parallelism, Programming, Networking, an Security, pages 254{265. Springer, 1996.

[17] A. Watters. kwparsing. http://starship.skyport.net/crew/aaron watters/- kwparsing/. [18] D. B. Wortman. Compiling at 1000 MHz an beyon. In Systems Implementation 2000, pages 194{206. Chapman & Hall, 1998.