IMPLEMENTING PARSERS AND STATE MACHINES IN JAVA. Terence Parr University of San Francisco Java VM Language Summit 2009

Similar documents
CS577 Modern Language Processors. Spring 2018 Lecture Interpreters

Building Compilers with Phoenix

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

Context Threading: A flexible and efficient dispatch technique for virtual machine interpreters

Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

CS 406/534 Compiler Construction Putting It All Together

Parsing III. (Top-down parsing: recursive descent & LL(1) )

LR Parsing E T + E T 1 T

Bottom-Up Parsing. Parser Generation. LR Parsing. Constructing LR Parser

Tizen/Artik IoT Lecture Chapter 3. JerryScript Parser & VM

Computing Inside The Parser Syntax-Directed Translation. Comp 412 COMP 412 FALL Chapter 4 in EaC2e. source code. IR IR target.

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

CSc 453 Interpreters & Interpretation

Dispatch techniques and closure representations

Outline CS412/413. Administrivia. Review. Grammars. Left vs. Right Recursion. More tips forll(1) grammars Bottom-up parsing LR(0) parser construction

Question Marks 1 /20 2 /16 3 /7 4 /10 5 /16 6 /7 7 /24 Total /100

Compiler Optimisation

What the CPU Sees Basic Flow Control Conditional Flow Control Structured Flow Control Functions and Scope. C Flow Control.

Lecture 8: Deterministic Bottom-Up Parsing

CS399 New Beginnings. Jonathan Walpole

CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

CSc 520 Principles of Programming Languages

Parsing Algorithms. Parsing: continued. Top Down Parsing. Predictive Parser. David Notkin Autumn 2008

Lecture 7: Deterministic Bottom-Up Parsing

CSc 453. Compilers and Systems Software. 18 : Interpreters. Department of Computer Science University of Arizona. Kinds of Interpreters

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

Parser Generation. Bottom-Up Parsing. Constructing LR Parser. LR Parsing. Construct parse tree bottom-up --- from leaves to the root

Compiling Techniques

UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division. P. N. Hilfinger

Mixed Mode Execution with Context Threading

Chapter 3. Parsing #1

CSCI312 Principles of Programming Languages

Code Generation Introduction

Formal Languages and Compilers Lecture VII Part 3: Syntactic A

CJT^jL rafting Cm ompiler

Course Overview. PART I: overview material. PART II: inside a compiler. PART III: conclusion

Syntactic Analysis. Top-Down Parsing

A Trace-based Java JIT Compiler Retrofitted from a Method-based Compiler

Just-In-Time Compilation

APT Session 7: Compilers

Bottom-Up Parsing II. Lecture 8

An Introduction to Multicodes. Ben Stephenson Department of Computer Science University of Western Ontario

JavaCC Parser. The Compilation Task. Automated? JavaCC Parser

CSE 401 Final Exam. December 16, 2010

Conflicts in LR Parsing and More LR Parsing Types

Chapter 2: Instructions How we talk to the computer

CSc 553. Principles of Compilation. 2 : Interpreters. Department of Computer Science University of Arizona

JVM. What This Topic is About. Course Overview. Recap: Interpretive Compilers. Abstract Machines. Abstract Machines. Class Files and Class File Format

Writing a Sony PlayStation Emulator Using Java Technology

shift-reduce parsing

CSE 582 Autumn 2002 Exam 11/26/02

ECE 468/573 Midterm 1 October 1, 2014

CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)

APT Session 6: Compilers

Review main idea syntax-directed evaluation and translation. Recall syntax-directed interpretation in recursive descent parsers

Compilers. Predictive Parsing. Alex Aiken

MidTerm Papers Solved MCQS with Reference (1 to 22 lectures)

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

CSCI 1260: Compilers and Program Analysis Steven Reiss Fall Lecture 4: Syntax Analysis I

COP4020 Programming Assignment 2 - Fall 2016

Prelude COMP 181 Tufts University Computer Science Last time Grammar issues Key structure meaning Tufts University Computer Science

CSE 401 Compilers. LR Parsing Hal Perkins Autumn /10/ Hal Perkins & UW CSE D-1

YETI. GraduallY Extensible Trace Interpreter VEE Mathew Zaleski, Angela Demke Brown (University of Toronto) Kevin Stoodley (IBM Toronto)

Compilers. Type checking. Yannis Smaragdakis, U. Athens (original slides by Sam

H.-S. Oh, B.-J. Kim, H.-K. Choi, S.-M. Moon. School of Electrical Engineering and Computer Science Seoul National University, Korea

CS553 Lecture Dynamic Optimizations 2

CSE 401/M501 Compilers

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

BEAMJIT: An LLVM based just-in-time compiler for Erlang. Frej Drejhammar

Computing Inside The Parser Syntax-Directed Translation. Comp 412

1. (a) What are the closure properties of Regular sets? Explain. (b) Briefly explain the logical phases of a compiler model. [8+8]

CS 432 Fall Mike Lam, Professor. Code Generation

In this simple example, it is quite clear that there are exactly two strings that match the above grammar, namely: abc and abcc

JVML Instruction Set. How to get more than 256 local variables! Method Calls. Example. Method Calls

Hardware, Modularity, and Virtualization CS 111

CS 4120 Introduction to Compilers

System Security Class Notes 09/23/2013

Intermediate Representations

Two hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE. Date: Friday 20th May 2016 Time: 14:00-16:00

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 4. Y.N. Srikant

ROB: head/tail. exercise: result of processing rest? 2. rename map (for next rename) log. phys. free list: X11, X3. PC log. reg prev.

Memory Management: Virtual Memory and Paging CS 111. Operating Systems Peter Reiher

Dalvik VM Internals. Dan Bornstein Google

Bottom-Up Parsing. Lecture 11-12

Running class Timing on Java HotSpot VM, 1

Jazelle ARM. By: Adrian Cretzu & Sabine Loebner

Run-time Environments. Lecture 13. Prof. Alex Aiken Original Slides (Modified by Prof. Vijay Ganesh) Lecture 13

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

LR Parsing of CFG's with Restrictions

CSE P 501 Compilers. Java Implementation JVMs, JITs &c Hal Perkins Winter /11/ Hal Perkins & UW CSE V-1

BSCS Fall Mid Term Examination December 2012

Memory Management Basics

Homework I - Solution

Compiler construction 2009

Topics Power tends to corrupt; absolute power corrupts absolutely. Computer Organization CS Data Representation

Compilers. Bottom-up Parsing. (original slides by Sam

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

Swift: A Register-based JIT Compiler for Embedded JVMs

Building a Parser Part III

Processors, Performance, and Profiling

Transcription:

IMPLEMENTING PARSERS AND STATE MACHINES IN JAVA Terence Parr University of San Francisco Java VM Language Summit 2009

ISSUES Generated method size in parsers Why I need DFA in my parsers Implementing DFA in Java Predicated DFA edges My kingdom for dynamic scoping Exceptions for flow-control Computed gotos for interpreter instruction dispatch

METHOD SIZE Methods generated from big grammar rules can blow past 64k bytes (I m trying to tighten up the generated code) Solution: manually split rules into multiple; not obvious to most users Can t do automatically due to actions; issues with args/locals a[int x] : A {...$x...} B {...$x...} ; a[int x] : a1 a2 ; a1 : A {...$x...} ; a2 : B {...$x...} ; Won t compile

LL(*) - MAKING DECISIONS WITH DFA Natural extension to LL(k) lookahead DFA: Allows cyclic DFA that can skip ahead past the modifiers to class or interface def // LL(*), but non-ll(k) for any k def : modifier* classdef modifier* interfacedef ; 'public'.. 'abstract' 'interface' Don t approximate entire CFG with a regex; i.e., don t include class or interface def rules Predict and proceed normally with LL parse s1 'class' s3=>2 s2=>1

SIMULATING STATE MACHINES Simulate DFA with bunch of tables public class T { static int[][] states static int[] s0 = { 0, 0, 0, 2, 0, 0, 8, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0,... }; static int[] s1 = {... };... static int[][] states = {s0, s1,...}; } aside from being slow to initialize, we run into the method size limit No static arrays in.class: must init elements 1-by-1 sipush n newarray int ; create array dup ; dups array sipush i ; push index iconst_0 ; value to store iastore 5 or 6 bytes per element leaves room for only 10k elements for all tables in static ctor

IMPL. DFA WITH GOTO To avoid static init issue, encode directly in Java. Idea is to use CPU jmp instruction to change state. States are code addresses. Avoids big matrices, vectors. LR parsers can be made to run 6 to 10 times as fast as the best table-interpretive LR parsers. * But, can t do arbitrary cyclic graphs w/o gotos in Java Why not generate bytecodes directly? because of predicated DFA edges; might have to compile arbitrary Java expressions (more in a second...) * Thomas Pennello, Very Fast LR Parsing in Proceedings of the 1986 SIGPLAN symposium on Compiler construction

SO, WHAT DO WE DO? No gotos => must simulate DFA with arrays Encode shorts as chars: 0,9,32 is \u0000\u0009\u0020 Encode arrays as strings, which are stored statically in the constant pool, to avoid static init size limit (got trick from jflex) Have to unpack into short/int arrays at runtime to initialize Run-length-encode to compress sparse matrices/arrays

PREDICATED DFA Edges can be arbitrary expressions; can ref locals and parameters; can t move predicates out of method() to predict() method[boolean isabstract] : methodhead ( {isabstract}?=> ';' { stat* } ) ; s0 ';'&&{(isabstract)}? '{' s1=>1 s2=>2 void method(boolean isabstract) { methodhead(); int alt = dfa39.predict(input); Won t compile in predict() switch (alt) {... } }

DYNAMIC SCOPING Idea: f() calls g(); g() can see f() s parameters and locals Mostly evil, but solves some code gen issues: let s us automatically split large rule methods let s us move predicates out of context in generated DFA Or, I could manage my own parameter stack; can t do that for locals, though (defined in arbitrary code)

USING EXCEPTIONS FOR CONTROL FLOW Backtracking parser must rewind upon failure and try next alternative IF-gates after every rule/token match is slow, big, messy alternative1 if (!failed) return; rewind input alternative2... // code for an alternative match(id); if (failed) return; match( = ); if (failed) return; expr(); if (failed) return;... But, aren t exceptions very slow? Gafter told me only creating an exception object is slow; throwing is fast. I m guessing faster than testing failed all the time

BYTECODE INSTRUCTION DISPATCH Overhead of fetch-decode-execute cycle switch/loop is high Poor cache characteristics; perhaps even pipeline issues Typical structure: while ( code[ip]!=halt ) { switch ( code[ip] ) { case ADD :... break; case JMP :... break; case RET :... break; } ip++; }

COMPUTED GO TO Threaded interpreter puts dispatch into instruction implementation code; no loop; better cache characteristics codeptr[] impl = { &ADD, &JMP, &RET,... }; ADD:... goto impl[code[++ip]]; JMP:... goto impl[code[++ip]]; RET:... goto impl[code[++ip]]; Dalvik VM trick: don t even use address table; allocate n bytes per implementation where n is power of 2. Instr impl address is &firstinstr+code[ip]<<lgn. E.g., impl. s at offsets 0, 16, 32, 48,... for n=16

CONCLUSIONS From language implementors point of view, would be nice to have: >64k bytecodes in methods static arrays in.class files gotos for DFA dynamic scoping (splitting rules, predicated DFA edges) computed gotos for interpreters I m not suggesting exposing all this to Java users Perhaps secret option Neal Gafter quietly gives out? ;)