xtc Robert Grimm Making C Safely Extensible New York University

Similar documents
Robert Grimm New York University

xtc Towards an extensbile Compiler Step 1: The Parser Robert Grimm New York University

Better Extensibility through Modular Syntax. Robert Grimm New York University

Better Extensibility through Modular Syntax

Parsing Expression Grammars and Packrat Parsing. Aaron Moss

Building Compilers with Phoenix

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

JavaCC Parser. The Compilation Task. Automated? JavaCC Parser

Chapter 4. Abstract Syntax

Using Scala for building DSL s

10. PEGs, Packrats and Parser Combinators

Semantic Modularization Techniques in Practice: A TAPL case study

Parsing Expression Grammar and Packrat Parsing

10. PEGs, Packrats and Parser Combinators!

COP4020 Programming Languages. Compilers and Interpreters Robert van Engelen & Chris Lacher

Comp 411 Principles of Programming Languages Lecture 3 Parsing. Corky Cartwright January 11, 2019

Index. Comma-separated values (CSV), 30 Conditional expression, 16 Continuous integration (CI), 34 35, 41 Conway s Law, 75 CurrencyPair class,

Object-Oriented Programming Fall Robert Grimm, New York University

CS420: Operating Systems

G Programming Languages Spring 2010 Lecture 4. Robert Grimm, New York University

Capriccio : Scalable Threads for Internet Services

About the Authors... iii Introduction... xvii. Chapter 1: System Software... 1

Formats of Translated Programs

Semantic Analysis. Lecture 9. February 7, 2018

Syntax-Directed Translation. Lecture 14

Combined Object-Lambda Architectures

Outline. Introduction Concepts and terminology The case for static typing. Implementing a static type system Basic typing relations Adding context

Programming Languages Third Edition. Chapter 7 Basic Semantics

OS Extensibility: SPIN and Exokernels. Robert Grimm New York University

It s Turtles All the Way Up: Self-Extensible Programming Languages in PLT Scheme

Alternatives for semantic processing

Maya: Multiple-Dispatch Syntax Extension in Java. Jason Baker and Wilson C. Hsieh University of Utah

Object-Oriented Programming CSCI-UA

Lexical and Syntax Analysis

Syntax and Grammars 1 / 21

Functional Parsing A Multi-Lingual Killer- Application

OPERATING SYSTEM. Chapter 4: Threads

Time : 1 Hour Max Marks : 30

Recursive Descent Parsers

Implementing Interfaces. Marwan Burelle. July 20, 2012

Syntax Analysis. Chapter 4

What do Compilers Produce?

22c:111 Programming Language Concepts. Fall Syntax III

Object-oriented Compiler Construction

CS152 Programming Language Paradigms Prof. Tom Austin, Fall Syntax & Semantics, and Language Design Criteria

CSE 401 Final Exam. December 16, 2010

CST-402(T): Language Processors

Domain-Specific. Languages. Martin Fowler. AAddison-Wesley. Sydney Tokyo. With Rebecca Parsons

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Spring UW CSE P 501 Spring 2018 C-1

CSE 12 Abstract Syntax Trees

COMP 181. Agenda. Midterm topics. Today: type checking. Purpose of types. Type errors. Type checking

Course Overview. Introduction (Chapter 1) Compiler Frontend: Today. Compiler Backend:

Object-Oriented Programming Fall Robert Grimm, New York University

Using an LALR(1) Parser Generator

CSCE 314 Programming Languages. Type System

Chapter 4 - Semantic Analysis. June 2, 2015

Intermediate Representations & Symbol Tables

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture)

Static Semantics. Winter /3/ Hal Perkins & UW CSE I-1

Project Compiler. CS031 TA Help Session November 28, 2011

DDMD AND AUTOMATED CONVERSION FROM C++ TO D

Semantic actions for declarations and expressions

Introduction to Compiler Construction

Semantic actions for declarations and expressions. Monday, September 28, 15

Semantic Analysis. Compiler Architecture

G53CMP: Lecture 4. Syntactic Analysis: Parser Generators. Henrik Nilsson. University of Nottingham, UK. G53CMP: Lecture 4 p.1/32

CSE P 501 Compilers. Parsing & Context-Free Grammars Hal Perkins Winter /15/ Hal Perkins & UW CSE C-1

General Purpose Languages Should be Metalanguages. Jeremy G. Siek University of Colorado at Boulder

LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

Working of the Compilers

In this simple example, it is quite clear that there are exactly two strings that match the above grammar, namely: abc and abcc

5. Semantic Analysis!

In One Slide. Outline. LR Parsing. Table Construction

Topics Covered Thus Far. CMSC 330: Organization of Programming Languages. Language Features Covered Thus Far. Programming Languages Revisited

Compiler Optimisation

Grammars and Parsing, second week

Motivation was to facilitate development of systems software, especially OS development.

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

SEMANTIC ANALYSIS TYPES AND DECLARATIONS

Introduction to Compiler Construction

LECTURE 3. Compiler Phases


Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Downloaded from Page 1. LR Parsing

CJT^jL rafting Cm ompiler

Compilers - Chapter 2: An introduction to syntax analysis (and a complete toy compiler)

Trustworthy Compilers

G Programming Languages - Fall 2012

Short Notes of CS201

CMSC 330: Organization of Programming Languages

CSE P 501 Compilers. Static Semantics Hal Perkins Winter /22/ Hal Perkins & UW CSE I-1

LR Parsing LALR Parser Generators

Chapter 4: Threads. Chapter 4: Threads

5. Semantic Analysis. Mircea Lungu Oscar Nierstrasz

Design Issues. Subroutines and Control Abstraction. Subroutines and Control Abstraction. CSC 4101: Programming Languages 1. Textbook, Chapter 8

CS201 - Introduction to Programming Glossary By

Motivation. Threads. Multithreaded Server Architecture. Thread of execution. Chapter 4

Introduction. Chapter 1:

Lecture Notes on Compiler Design: Overview

Transcription:

xtc Making C Safely Extensible Robert Grimm New York University

The Problem Complexity of modern systems is staggering Increasingly, a seamless, global computing environment System builders continue to use C (or C++) Inadequate basis for coping with this complexity There simply is too much code As a result, software quality and security suffer Think critical updates for Windows or Mac OS X Metaprogramming holds considerable promise Basic idea: Automatically compute (parts of) programs instead of manually writing them

The Power of Metaprogramming libasync [Mazieres, Usenix 01] Provides extensions for asynchronous programming Helps write straight-line code in presence of callbacks Capriccio [von Behren et al., SOSP 03] Provides a highly scalable threading package Uses stack depth analysis to reduce memory overhead MACEDON [Rodriguez et al., NSDI 04] Provides DSL for specifying overlay protocols Compiles state machine specification to runnable system How to give this power to other system builders? Make language and compiler extensible through macros!

The Four Requirements Expressive enough for new definitional constructs Domain-specific (e.g., closures, state machines) General (e.g., modules, objects, generics, ) char* host = www.example.com ; char* path = /index.html ; dnslookup(host, void closure(ip_t addr) { tcpconnect(addr, 80, void closure(int fd) { write(fd, path, strlen(path)); }); }); Safe to statically detect program errors Not only macro hygiene, but also new typing constraints Avoid string of obscure error messages (e.g., C++ templates)

The Four Requirements (cont.) Fine-grained to compose and reuse extensions Track dependencies Detect conflicts Efficient code as compiler output Specialized expansions (e.g., foreach for arrays) Domain-specific optimizations E.g., stack depth analysis for Capriccio No existing macro system meets all requirements Little work on extensible type systems, macro composability

Enter xtc (extensible C) Macro system for C Grammar modification rules Abstract syntax tree (AST) transformation rules Typing rules Macro (meta-)module system Toolkit for extensible source-to-source transformers Rats!, our extensible parser generator AST representation and transformation framework Traditional compiler (gcc) Conventional optimizations and binary code generation

Why C? Still popular for operating and distributed systems E.g., Linux, BSDs, Apache, MySQL, BerkeleyDB Unlike Java or C#, no need for a virtual machine Which, ironically, is often implemented in C Unlike C++, Objective-C, Java, C#, no object model C++ s object system is very complex Java s model is much simpler, but has performance issues More powerful models can be quite useful Think predicate dispatch [Millstein, OOPSLA 04]

Talk Outline Motivation and requirements Our parser generator and prototype framework The xtc macro system Including interesting/open research issues Conclusions

Our Parser Generator and Prototype Framework Focused on extensibility and ease-of-use Easy to change grammars, transformers Changes should be localized Written in Java Represents temporary engineering trade-off Simple object system, GC, collections framework, reflection Provides us with a first, large-scale test case As soon as xtc is functional, will make it self-hosting Released as open-source Currently at version 1.3, version 1.4 coming soon

Rats! Parser Generator Basic strategy: Packrat parsing Originally described by Birman [Phd 70] Rediscovered by Ford [ICFP 02, POPL 04] Pappy: A packrat parser generator for Haskell Parses top-down (like LL) Orders choices (unlike CFGs, LR, and LL) Treats every character as a token (unlike LR and LL) Supports unlimited lookahead through syntactic predicates Performs backtracking, but memoizes each result Linear time performance One large table: characters on x-axis, nonterminals on y-axis

Rats! vs. Pappy More concise grammar specifications Automatic deduction of semantic values Automatic construction of AST (using generic nodes ) Better support for debugging and error reporting Pretty-printing of grammars and parsers Automatic annotation of AST with source locations Automatic generation of error messages More aggressive optimizations Faster (~5 times), less memory (~4 times) than Pappy Still slower than ANTLR and JavaCC (2-6 times) But reasonable absolute performance (67 KB in 0.3 sec)

Our Framework So Far Three main abstractions Abstract syntax tree (AST) nodes to represent code Visitors to walk and transform AST nodes Methods selected through reflection (with caching) Utilities to store cross-visitor state Analyzer for analyzing grammars Printer for pretty printing source code Two axes of extensibility New visitors to represent new compiler phases New AST nodes to represent new language constructs process() methods specified with new AST nodes

The xtc Macro System xtc implemented as pipeline ending with gcc Ensures wide-spread availability xtc macros consist of four kinds of rules Grammar rules for expressiveness AST transformation rules for expressiveness, efficiency Typing rules for safety, expressiveness Dependency rules for composability

Macro Rules in Detail Grammar rules specify syntax Add or remove alternatives in choices May specify before and after constraints on named alternatives Add new productions AST transformation rules reduce language, optimize Expressed through templates Literal syntax mixed with pattern expressions Selected based on constraints Nonterminals, kinds of types, individual types Builds on Maya s multiple dispatch [Baker & Hsieh, PLDI 02] Supplemented by AST analysis and transformation library

Macro Rules in Detail (cont.) Typing rules express safety constraints Introduce new types Express type compatibility (e.g., subtyping) Deduce type of compound expressions, statements Supported by symbol table and type records Represented as collections of (nested) name/value pairs Dependency rules express inter-macro constraints Specify other macros used by macro E.g., closure macro building on object system Group several macros into a larger extension E.g., objects, GC, exceptions, threading/synchronization in one

The Beginnings of a Class Macro source { class $(id:identifier) { $(members:memberdeclarations) } } target { typedef struct $(gensym(id, object )) { void *vtable; $(select( /FieldDeclaration, members)) } * $id; } newtype { kind = Object, class = id, fields = createmap( select( /FieldDeclaration/Identifier, members), typeof(select( /FieldDeclaration/Type, members)), methods = }

The Beginnings of a Class Macro (cont.) source { $(exp:expression!object). $(id:identifier) } target { $exp -> $id } type { select( /fields/$id, typeof(exp)) } Starting to define supporting library functions AST querying and (re-)construction (e.g., select()) Not shown: Splicing implicit pointer to self into methods Typing (e.g., createmap(), typeof()) Not shown: Explicit symbol table access Program analysis (e.g., freevars()) Not shown: Making implicit references to self explicit

More Research Issues Macro expansion order: Outside-in vs. inside-out Outside-in supports local macro definitions But requires parsing and macro expansion be interleaved Inside-out required by some macros (e.g., dynamic codegen) Either way, parsing and typing have to be interleaved Need to distinguish between type names and regular identifiers in C Choice of base language C has many redundant constructs (3 types of loops, ++, --) Tedious to specify domain-specific optimizations CIL provides better basis [Necula et al., CC 02] Concise and formally defined language kernel

Even More Research Issues Expressiveness of macro module system Direct dependencies are good enough to get us started But we may want to provide more flexibility I.e., there may be more than one way to implement a syntax Think: Object layout, multiple/predicate dispatch We are considering a macro traits facility Macros implement protocols, comparable to interfaces What do you need for realizing your domainspecific optimizations in xtc?

Conclusions Metaprogramming can help us cope with increasing complexity of (distributed) systems A good solution needs to meet four requirements Expressiveness, safety, composability, efficiency xtc strives to meet these requirements for C Expresses macros as collections of rules Grammar, AST transformation, typing, dependency Builds on our toolkit for source-to-source transformers and on Rats!, our packrat parser generator Leaves conventional optimizations and code generation to gcc

http://www.cs.nyu.edu/rgrimm/xtc/