JScheme : A Scheme Interpreter Embedded Within Java Source Code

JScheme : A Scheme Interpreter Embedded Within Java Source Code CPSC 511 Term Project by Jeff Sember Abstract Mixing two or more programming languages together within a single project can be nontrivial, involving different types of source code that must be compiled by different programs to generate various object or class files. We demonstrate a novel method of implementing JScheme, a variant of the Scheme language, that embeds both its source and object code within standard Java source files. We provide a compiler that scans Java source files, detecting JScheme source within Java comments, and compiles this source so that the resulting procedures and definitions are accessible to the Java code. Java and JScheme code can thus be used together in a development process that requires only the provided JScheme compiler and runtime class library. We investigate advantages and drawbacks of this approach, and discuss future enhancements. 1 Introduction Mixing two or more programming languages together within a single project is usually non-trivial. Code written in different languages are usually are kept in separate files, which must be compiled and/or interpreted by different programs. Some languages, such as C++, involve both source and header files that must be kept synchronized, which further complicates the development process. For all but very small programs, determining what files need to be recompiled can be a challenge, relying on dependencies between the source elements. As a result, the build process is usually coordinated by an external makefile or by an integrated development environment (IDE). The Java programming language improves this situation in two ways. First, there are no header files per se; a class file is self-contained within a single source file (though dependencies may still exist due to interfaces and inherited classes). Second, the Java compiler automatically recompiles only those classes that require recompilation, which negates the need for a makefile (though they are often used for other elements of a Java programming project). 1

Combining Java code with code written in another language is still not a trivial process. Compiling the other language s source may require the use of a makefile or IDE, and interaction between the different languages at runtime requires a special library, such as the Java Native Interface [5]. We take a different approach to allow a program to use both Java and JScheme code. JScheme code is contained within standard Java source files, by being embedded within Java multiline comments. Support for JScheme is achieved in two stages: first, the Java source files are compiled by the JScheme compiler, which parses the embedded JScheme code and writes the results back into the same Java source file in a compact form. Second, during the normal execution of the Java application, the JScheme code is evaluated by a provided runtime package to support interaction with Java code. We would like to stress that we do not change the syntax of Java source files; they remain standard Java, and must be compiled by a Java compiler, as the JScheme compiler only processes the JScheme source code. In section 2, we describe the JScheme language. In section 3, we describe how the JScheme compiler is used to compile JScheme source code, and how a Java program can interact with the compiled code. Section 4 delves into some details of how JScheme has been implemented. We finish with some results and concluding remarks in section 5. 2 The JScheme language JScheme implements a subset of the Scheme [1] language. For simplicity, we have omitted certain features of the Scheme language; for instance, the only numeric type supported is signed integers, and no support for continuations is provided. We have also not optimized procedure evaluations to perform tail-recursion efficiently. The JScheme grammar is shown in figure 1. JScheme contains a number of predefined procedures; see figure 2. For documentation describing these procedures, see [1]. The one nonstandard procedure included in this list is currbindings, which returns a string describing the current environment bindings. 3 Working with JScheme This section describes how JScheme source code is embedded within Java source files, and how the code can be used by a Java program. 3.1 Storing JScheme source code within Java source files A JScheme <program> element is wrapped in a Java multiline comment in the following way: /*s <program> */ 2

<program> ::= <elem>* <elem> ::= <exp> <def> <import> ( begin <elem>+ ) <import> ::= #import <file:string> <def> ::= ( define <id> <exp> ) ( define ( <id> <formal:id>* ) <body> ) ( define ( <id>. <varformals:id> ) <body> ) ( define-datatype <name:id> <predicate:id> <dt-var>* ) <body> ::= <def>* <exp>+ <dt-var> ::= ( <variant:id> <dt-field>* ) <dt-field> ::= ( <field:id> <predicate:expr> ) <lit> ::= <boolean> <number> <character> <string> <quotation> <boolean> ::= #t #f <character> ::= #\<any character> #\space #\newline <quotation> ::= <datum> (quote <datum>) <datum> ::= <boolean> <number> <character> <string> <symbol> <list> <vector> <symbol> ::= <id> <keyword> <list> ::= ( <datum>* ) ( <datum>+. <datum> ) <datum> <vector> ::= #( <datum>* ) <exp> ::= <id> <lit> <proccall> <lambdaexp> ( if <test:exp> <consequent:exp> [<alternate:exp>] ) ( set! <id> <exp> ) ( cond <cond_clause>+) ( cond <cond_clause>* (else <exp>+)) ( and <exp>*) ( or <exp>*) ( let [<id>] ( <binding>* ) <body> ) ( let* ( <binding>* ) <body> ) ( letrec ( <binding>* ) <body> ) ( begin <exp>+ ) ( cases <name:id> <expr> <case>* [(else <def:expr>)] ) ( do ( <do_var>* ) ( <test:exp> <after:exp>* ) <cmd:exp>* ) <case> ::= (<variant-name:id> ( <field:id>* ) <body> ) <proccall> ::= ( <operator:exp> <operand:exp>* ) <lambdaexp> ::= ( lambda <formals> <body> ) <formals> ::= ( <id>* ) <id> <cond_clause> ::= ( <condition:exp> <exp>+ ) (<exp>) <binding> ::= (<id> <exp>) Figure 1: JScheme grammar 3

* + - / < <= = > >= add1 append boolean? cadr car cdr char->integer char? cons currbindings display equal? eqv? expt foldl foldr integer->char length list list->vector list-of list-ref list? make-vector map member newline not null? number->string number? pair? printf procedure? reverse set-car! set-cdr! string->symbol string-append string-length string-ref string? sub1 symbol->string symbol? vector vector->list vector-fill! vector-length vector-ref vector-set! vector? write-char Figure 2: JScheme library procedures 4

The s prefix tells the JScheme compiler that this Java comment contains a JScheme <program> element. Each of these comments must be within a Java class (i.e., within the outermost enclosing {...} tokens). Inner classes are not recognized by the JScheme compiler. Any <program> elements placed within an inner class will be treated as if they are part of the outermost enclosing class. The <import> element is provided to allow JScheme code to be stored in an external source file. When the compiler encounters this element, the next string token (in quotation marks) is interpreted as the name of a text file containing JScheme source code, which is then read in as if it appeared directly within the Java source file. Using Java comments to store data that is not part of the program itself is not a new idea. Comments are used to store javadoc tags for generating class, method, and field documentation, and have also been used to store annotations for static type checking [2]. To our knowledge, this is the first time comments have been used to store source code. 3.2 Compiling embedded JScheme source code Once the JScheme source code has been placed within the Java source code, it must be compiled by the JScheme compiler. jscomp [<options>] <file>* Each <file> is the name of a Java source file (the.java) extension may be omitted). Alternatively, by using the -d option, each <file> is interpreted as a directory. Every Java file contained within the subtree rooted at that directory will be compiled. Figure 3 contains a list of the options for the jscomp compiler. jscomp : JScheme compiler Usage: jscomp [<options>] <file>* Options: --verbose, -v : verbose output --echo, -e : echo input to stdout --trace, -T : trace scanner -s : simulation only; don t modify any files -x <ext> : change extension (default=java) -b : don t save backups of originals -d [<path>] : process every.java file in <path> -D : don t store source file debug information --help, -h : help Figure 3: JScheme compiler options If a Java source file contains JScheme source, the compiler will add a static field to the end of the Java class definition. See figure 4. The compiler inserts Java source code between single-line Java comments //[500... //]500 (the pair of numbers, which are equal in value, 5

public class Example { /*s (define (factorial n) (letrec ((fac-times (lambda (n acc) (if (= n 0) acc (fac-times (- n 1) (* acc n)))))) (fac-times n 1))) */ public static void main(string[] args) { //... } //[500 static JSRuntime rt = new JSRuntime( "2jCHx/S3PB1mA9wAc3jcoqfb51... c3h4xh6p5/ty" +"np6ay8cv9zxentnf17d87vkh7t... CLIRAwAAAACw" +"FTYq2FbYqGwTAm1UtglBNyLIRI... AAAAAAAAAAAA" ); //]500 } Figure 4: JScheme compiler output 6

are arbitrarily chosen by the compiler). Any old source code between such comment pairs is deleted and replaced (if necessary) by the compiler, so users should avoid modifying such source themselves. As a safety feature, before any changes are made to an existing Java source file, a backup of the file is stored in a backup directory named jscheme backups which is located in the user s home directory. This directory is specified by the Java user.home system property. Up to five backup copies of each file are retained within this directory, with a one replacing the oldest. The -b option can be used to disable this backup process. 3.3 The JScheme runtime interpreter As seen in the previous section, the JScheme compiler inserts a rather cryptic set of lines within a Java source file. At the end of the appropriate containing class, a static class variable rt of type JSRuntime is defined and initialized. The initialization string contains the compiled JScheme code, encoded using the Base64 [3] encoding scheme. When the Java class is loaded by the JVM class loader, the rt field is constructed, which causes all of the JScheme definitions and expressions contained within the <program> to be evaluated. A Java program can interact with a JScheme runtime interpreter (rt) in several ways. For instance, it can have the interpreter evaluate additional <program>s by calling the rt.eval(1) method, with its argument containing the text of the <program>. This method returns an SNode object, representing the value of the last expression evaluated. A Java program can add bindings to the current rt environment by calling the rt.define(1) method with an SNode object as its argument. This method returns an SId object containing the (unique) identifier representing the bound value. To allow JScheme code to call Java code, a Java-defined JScheme procedure can be added to the JSRuntime environment by calling the rt.define(2) method with the procedure name and an object that implements the IJavaProcedure interface. Figure 5 displays a sample program that demonstrates these features, along with the program s output. For more details on how these methods are to be used, the reader should consult the JSRuntime documentation. 4 Implementation Details In this section, we describe some of the problems encountered during development of the JScheme project and the solutions taken to solve them. 7

import jscheme.*; public class Example { /*s (define (factorial n) (letrec ((fac-times (lambda (n acc) (if (= n 0) acc (fac-times (- n 1) (* acc n)))))) (fac-times n 1))) */ public static void main(string[] args) { try { System.out.print("factorial 7 = "); System.out.println(rt.eval("(factorial 7)").intValue()); // add a Java-defined procedure myproc which calculates // the sum of its arguments, which are assumed to be numbers rt.define("myproc", new IJavaProcedure() { public SNode evaluateapp(jsruntime rt, SNode[] args) { int sum = 0; for (int i = 0; i < args.length; i++) sum += args[i].intvalue(); return new SNumber(sum); } }); // call myproc SNode result = rt.eval("(myproc 20 30 5)"); // add binding to the result SId id = rt.define(result); // print the value of the binding rt.eval("(printf \"value of "+id+" is: a n\" "+id+" )"); } catch (SchemeException e) { System.out.println(rt.toString(e, true)); } } //[500 static JSRuntime rt = new JSRuntime( "2jCHx/S3PB1mA9wAc3jcoqfb5Tkg... nvbgaaaa==" ); //]500 } Figure 5: JScheme test program and output 8

4.1 Internal representation of JScheme values During compilation, every JScheme expression or definition is parsed into an object we have named an SNode. These are organized into subclasses representing identifiers, symbols, values (numbers, boolean constants, characters, strings, pairs, and vectors), and procedure-related types (procedures, closures, and procedure applications). An additional SNode type, SUser, is provided to allow Java programs to wrap arbitrary Java objects within SNodes. Each SNode is evaluated with respect to a JSRuntime object. Some nodes, such as numbers and strings, evaluate to themselves. Others, such as identifiers, use the JSRuntime object to evaluate to something else. Procedures evaluate to closures, and procedure applications evaluate their arguments and use closures to evaluate their procedure s body. Compiled JScheme data includes tokens that indicate the location within a source file that produced each JScheme element. This is useful for debugging, since the interpreter maintains a stack of these tokens during evaluation and can produce a stack trace when an exception occurs. The drawback is these tokens increase the size of the compiled code (by about 50%). The tokens can be omitted by including the -D option when running the JScheme compiler. 4.2 Compiled JScheme programs The JScheme compiler parses JScheme source code, then encodes the resulting SNodes into a compact form that can be accessed by Java code. One might ask why this is necessary, since the JSRuntime class provides a method eval which does essentially the same thing. There are two main advantages to precompiling the JScheme code. First, compiling the code into SNodes is six to seven times slower than the process of extracting the SNodes from the Base64-encoded strings. Second, the JScheme source code need not appear in its uncompiled state within the compiled Java code. Compiling the JScheme code makes examining the code difficult, which may be important for proprietary software. We did not need to write a Java source code parser. A Java tokenizer was sufficient, since we only need to recognize special multiline comments, and count nested braces for detecting the start and end of (non-inner) Java classes. Both the Java and JScheme tokenizers were constructed using makedfa, a tool that constructs a deterministic finite-state automaton from a set of regular expressions [6]. We have chosen to store the compiled JScheme code as a Base64-encoded string. This is not the most efficient way to store the compiled code, since each character of the string decodes to only six bits, and since Java characters are Unicode, ten bits per character are wasted. A more memory-efficient storage scheme would be to store the compiled data as a static array of in- 9

tegers, in which each bit is used. We chose to use strings since they have a more compact representation in source code, since each character of source code could represent at most four bits of compiled data (if the integers were represented in hexadecimal). 4.3 Environments and closures An environment is a data type that associates a value with each element of a finite set of symbols [4]. Every JSRuntime object contains an environment to bind values to SId nodes. Procedures require a snapshot of an environment at the time a procedure is evaluated for later use when the procedure is applied; such snapshots are called closures. In JScheme, we extend an environment by wrapping the old environment within a new one. An environment consists of a hash table that contains bindings of identifiers to values, and a (possibly null) pointer to an earlier (nonextended) environment. Environments are thus immutable objects, which makes implementing closures easy. The cost of this approach is in running time efficiency: finding the value bound to an id is linear in the number of nested environments, which may be large in the case of a recursive procedure call. A more efficient method of implementing closures remains a future enhancement for the JScheme project. 5 Results To test the JScheme system, we created an Eclipse project (jstest) containing several test files. The results of each test are described below. 1. We found a scheme implementation of sets using balanced binary trees, and attempted to compile it as JScheme source code. This failed during runtime, since the standard scheme procedure zero? does not exist in JScheme. After adding this procedure to the test program, the program ran correctly. 2. We found a scheme implementation of the Eratosthenes sieve, and attempted to compile it and use it to generate the first thirty prime numbers. It failed at runtime due to the missing procedure quotient. It ran correctly once we added this procedure to the sieve code. 3. We took a small scheme program that the author wrote for an undergraduate assignment and embedded it within a Java program (Test3). Initially, the scheme would not compile, since in several places, the null list () had not been quoted. Once this was corrected, the program compiled and ran correctly. The JScheme compiler and runtime interpreter allows Java and JScheme code to interact with each other within a standard Java runtime environment. 10

Embedding JScheme code within Java source files simplifies the organization of a programming project, since the JScheme compiler can read, process, and modify the Java source files without creating any auxilliary object or data files. This approach is practical, since the compiled JScheme code can be stored efficiently as static data within Java classes (though a more efficient representation would be to use integers instead of strings; see section 4.2). Calling JScheme code from a Java program (and vice versa) is provided by one additional package, which requires a single import statement. There are two main drawbacks of using this approach. First, since we are compiling JScheme into data that is interpreted at runtime by a Java package, it cannot run as efficiently as code that has been compiled to be run by a faster interpreter, for instance, if the code were compiled to Java bytecode [7], or better yet, to native machine code. A second drawback is that embedding scheme source in Java comments will defeat any special IDE features that exist to aid in writing scheme code, such as syntax coloring or smart indenting (though Eclipse will perform simple brace matching in comments). Keeping the JScheme source code within its own file (with an appropriate extension that an IDE recognizes) and importing it via the #import directive may help with this problem. Future enhancements to the JScheme project might involve extending the JScheme language to include additional standard features such as continuations and tail recursion optimizations, as well as adding support for a variety of numeric data types such as floating point and complex numbers. References [1] H. Abelson et al. Revised report on the algorithmic language scheme. Higher Order Symbol. Comput., 11(1):7 105, 1998. [2] C. Flanagan, K. Rustan M. Leino, M. Lillibridge, G. Nelson, J. B. Saxe, and R. Stata. Extended static checking for java. In PLDI 02: Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation, pages 234 245, New York, NY, USA, 2002. ACM. [3] N. Freed and N. Borenstein. Multipurpose internet mail extensions (mime) part one: Format of internet message bodies. Internet RFCs, 1996. [4] D. P. Friedman, C. T. Haynes, and M. Wand. Essentials of programming languages (2nd ed.). Massachusetts Institute of Technology, Cambridge, MA, USA, 2001. [5] S. Liang. Java Native Interface: Programmer s Guide and Reference. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1999. 11

[6] J. Sember. Tokenizing with deterministic finite state automata, 2005. http://www.sfu.ca/ jpsember/tokenizer.html. [7] B. P. Serpette and M. Serrano. Compiling scheme to jvm bytecode: a performance study. In ICFP 02: Proceedings of the seventh ACM SIG- PLAN international conference on Functional programming, pages 259 270, New York, NY, USA, 2002. ACM. 12