Program generation for schema-based, typed data access

Similar documents
Using Scala for building DSL s

Plan. Language engineering and Domain Specific Languages. Language designer defines syntax. How to define language

Language engineering and Domain Specific Languages

Syntax and Grammars 1 / 21

Concepts of Programming Design

Collage of Static Analysis

New Programming Paradigms

Domain-Specific. Languages. Martin Fowler. AAddison-Wesley. Sydney Tokyo. With Rebecca Parsons

COS 320. Compiling Techniques

CIS 1.5 Course Objectives. a. Understand the concept of a program (i.e., a computer following a series of instructions)

Principles of Programming Languages [PLP-2015] Detailed Syllabus

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Meta-programming with Names and Necessity p.1

Compilers. Prerequisites

REFLECTIONS ON OPERATOR SUITES

Functional Programming in Hardware Design

COMP 181. Agenda. Midterm topics. Today: type checking. Purpose of types. Type errors. Type checking

Lecture Notes on Compiler Design: Overview

Big Bang. Designing a Statically Typed Scripting Language. Pottayil Harisanker Menon, Zachary Palmer, Scott F. Smith, Alexander Rozenshteyn

Frustrated by all the hype?

Introduction to Dependable Systems: Meta-modeling and modeldriven

CST-402(T): Language Processors

1. true / false By a compiler we mean a program that translates to code that will run natively on some machine.

CS 565: Programming Languages. Spring 2008 Tu, Th: 16:30-17:45 Room LWSN 1106

Combined Object-Lambda Architectures

9/5/17. The Design and Implementation of Programming Languages. Compilation. Interpretation. Compilation vs. Interpretation. Hybrid Implementation

52. Staged Software Architectures with Staged Composition

General Purpose Languages Should be Metalanguages. Jeremy G. Siek University of Colorado at Boulder

Verified compilers. Guest lecture for Compiler Construction, Spring Magnus Myréen. Chalmers University of Technology

Compositional Model Based Software Development

Concepts of Programming Languages

Compiler Theory. (Semantic Analysis and Run-Time Environments)

CSE 12 Abstract Syntax Trees

Programming language design and analysis

Dynamic Cuda with F# HPC GPU & F# Meetup. March 19. San Jose, California

Domain-Specific Languages Language Workbenches

Compiling and Interpreting Programming. Overview of Compilers and Interpreters

Typing & Static Analysis of Multi-Staged Programs

Formal Systems and their Applications

Programming Languages Third Edition

Verifiable composition of language extensions

What Is Computer Science? The Scientific Study of Computation. Expressing or Describing

Defining Domain-Specific Modeling Languages

Scala, Your Next Programming Language

Why are there so many programming languages? Why do we have programming languages? What is a language for? What makes a language successful?

TYPE INFERENCE. François Pottier. The Programming Languages Mentoring ICFP August 30, 2015

TagSoup: A SAX parser in Java for nasty, ugly HTML. John Cowan

COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR

A case in Multiparadigm Programming : User Interfaces by means of Declarative Meta Programming

Working of the Compilers

COP4020 Programming Languages. Compilers and Interpreters Robert van Engelen & Chris Lacher

Introducing Wybe a language for everyone

Lecture 09. Ada to Software Engineering. Mr. Mubashir Ali Lecturer (Dept. of Computer Science)

Turning proof assistants into programming assistants

xtc Robert Grimm Making C Safely Extensible New York University

JVM ByteCode Interpreter

Programming Languages

What do Compilers Produce?

Parser Design. Neil Mitchell. June 25, 2004

Adding GADTs to OCaml the direct approach

CS153: Compilers Lecture 15: Local Optimization

CS252 Advanced Programming Language Principles. Prof. Tom Austin San José State University Fall 2013

Overview of the Ruby Language. By Ron Haley

Optimising Functional Programming Languages. Max Bolingbroke, Cambridge University CPRG Lectures 2010

COP 3402 Systems Software. Lecture 4: Compilers. Interpreters

Computing Inside The Parser Syntax-Directed Translation. Comp 412 COMP 412 FALL Chapter 4 in EaC2e. source code. IR IR target.

SIR C R REDDY COLLEGE OF ENGINEERING

Informatica 3 Syntax and Semantics

Lecture 13: Subtyping

Who are we? Andre Platzer Out of town the first week GHC TAs Alex Crichton, senior in CS and ECE Ian Gillis, senior in CS

Chapter 11 :: Functional Languages

Semantic Modularization Techniques in Practice: A TAPL case study

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILERS

Functional Languages. CSE 307 Principles of Programming Languages Stony Brook University

CS152 Programming Language Paradigms Prof. Tom Austin, Fall Syntax & Semantics, and Language Design Criteria

An Evaluation of Domain-Specific Language Technologies for Code Generation

4. An interpreter is a program that

By: Chaitanya Settaluri Devendra Kalia

Seminar in Programming Languages

St. MARTIN S ENGINEERING COLLEGE Dhulapally, Secunderabad

CSE 505 Graduate PL. Fall 2013

DOT NET Syllabus (6 Months)

CS 415 Midterm Exam Spring 2002

Programming Language Pragmatics

MetaML: structural reflection for multi-stage programming (CalcagnoC, MoggiE, in collaboration with TahaW)

First-class Implementations

Compilers and Interpreters

Index. Comma-separated values (CSV), 30 Conditional expression, 16 Continuous integration (CI), 34 35, 41 Conway s Law, 75 CurrencyPair class,

Programming Languages (PL)

Compiler Design Spring 2018

Topics Covered Thus Far. CMSC 330: Organization of Programming Languages. Language Features Covered Thus Far. Programming Languages Revisited

List of Figures. About the Authors. Acknowledgments

Programming Languages Fall Prof. Liang Huang

CSE 413 Languages & Implementation. Hal Perkins Winter 2019 Structs, Implementing Languages (credits: Dan Grossman, CSE 341)

Formal editing: jedit-mmt. Narrative editing: LaTeX-MMT. Browsing: MMT web server. Building: MMT scripting language. The MMT API: A Generic MKM System

ADMINISTRATIVE MANAGEMENT COLLEGE

Programmiersprachen (Programming Languages)

CSE 70 Final Exam Fall 2009

Crafting a Compiler with C (II) Compiler V. S. Interpreter

CSSE 490 Model-Based Software Engineering: Transformational Programming

Transcription:

Program generation for schema-based, typed data access Ralf Lämmel Software Engineer Facebook, London

Program generation A use case at Facebook Purpose of generation: typed data access ("O/R mapping" et al.) Meta = Object language = Hack (not to be confused with PHP) Input for generation: schemas (or mapping definitions)

Generate access code from schema We illustrate the use case with opensource software. https://code.facebook.com/posts/1624644147776541/writing-code-that-writes-code-with-hack-codegen/

The underlying codegen library We illustrate the use case with opensource software. https://code.facebook.com/posts/1624644147776541/writing-code-that-writes-code-with-hack-codegen/

Signing of generated code Key ideas: Generate hash code from generated code. Prevent diffs that change the generated code. We illustrate the use case with opensource software. https://code.facebook.com/posts/1624644147776541/writing-code-that-writes-code-with-hack-codegen/

Further reading Introducing "hack-codegen" https://code.facebook.com/posts/1624644147776541/writing-code-that-writes-code-with-hack-codegen/ Open source code base for hack-codegen https://github.com/hhvm/hack-codegen Typed data access proof of concept (some form of EntSchema) https://github.com/hhvm/hack-codegen/tree/master/examples/dorm Docs on hack-codegen http://hhvm.github.io/hack-codegen/ Information on the "Ent" framework https://github.com/facebook/dataloader This is publicly available information which does not perfectly match Facebook's internal approach.

Program generation challenges not just necessarily within the chosen scope at Facebook Metaprogramming support? Customization of generated code? Integration of generation with development? Complexity of generator! Size of the generated code!... Let's look at these challenges, one by one, in terms of the aforementioned use case and its Facebook instance.

Metaprogramming support? Staging? Compile- or runtime generation may be too slow. It may disrupt development (e.g., source-code-based tooling). Staged programming support was not there to start with. Type safety? The generators are tested a lot. Concrete object syntax? Code construction is served by fluent APIs. There is a lot of logic; there is less templates.

Customization of generated code Manual sections are removed from signature for signing. Signing helps with avoiding unintended changes. Manual sections are kept when re-generating code. - We may need to manage partially generated code. - Changes of defaults may be non-obvious. - Customized code may fail to break when it should. - Host language may limit factoring of customization. We illustrate the use case with opensource software. https://code.facebook.com/posts/1624644147776541/writing-code-that-writes-code-with-hack-codegen/

Integration of program generation with development + We generate code to test generated code. - Generated code may be out of sync with database. - Generated code may permit too much on database. - Generated code may swamp version control. - Generated code is in the face of the developer.

Complexity of generator - Developers may face many different files per schema. - The code generator is tied into lifecycle management. - The codebase may get locked into large API. - The evolution of the generator may be difficult.

Size of the generated code - We start to have a problem, when the size of generated code is in the same ballpark as the size of the genuine code. Think of the time to... re-generate, teach IDE, compile, run tests, commit,...

Wanted! Improvements: Generate less code! Compile less code! Version-control much less code! Don't think so much in terms of code! Better separate generated and non-generated code! Preservation: Data access is typed! Get in touch, if interested. Provide IDE support such as code completion! Enable intuitive customization of generated code!

Towards a relatively systematic literature survey on program generation

Questions of interest Broad category Does the approach address a case study (on program generation), or language support (for program generation), or an application domain (of program generation), or something else? In reality, approaches may combine all of these.

Questions of interest Application domain If the approach addresses a case study, then what is the underlying application domain? If the approach addresses language support, then does it point at some application domain? If the approach addresses an application domain, then which one is it and how well does it relate to data access?

Questions of interest Language aspects What is the object language? What is the metalanguage? What language support is facilitated? What infrastructural support (e.g., a library) is facilitated? In reality, meta = object language may be a common case, of course.

Questions of interest Generation time Is the code generated at... (RT) run time or (CT) compile time or (BT) build time (i.e., ahead of compile time)? Does the generated code provide guarantees... at run time (e.g., typed data access) or at compile time (e.g., well-typedness)?

Questions of interest Customization Is customization of generated code facilitated, and, if so,... is customization achieved by some form of specialization (e.g., subclassing) or some form of code changes, and, if so,... is round-tripping / synchronization supported, and, if so,... by what means and to what extent, or by other means and is such customization essential, and, if so, because of which specifics (language or domain)?

Questions of interest Software engineering How is code generation addressed by the software building? How is code generation addressed by software testing? How is code generation addressed within the IDE? What other software engineering aspects are addressed?

Questions of interest System design Does the generated code entail... a black/gray box (e.g., a parser)... as a service or at the command line or a library (essentially an interface) or a framework (typically requiring specialization) or a template (typically requiring some code changes)?

Venues GPCE https://dblp.uni-trier.de/db/conf/gpce/ DSL https://dblp.uni-trier.de/db/conf/dsl/ SAIG https://dblp.uni-trier.de/db/conf/saig/ GTTSE https://dblp.uni-trier.de/db/conf/gttse/ SLE https://dblp.uni-trier.de/db/conf/sle/ PEPM https://dblp.uni-trier.de/db/conf/pepm/ ICFP https://dblp.uni-trier.de/db/conf/icfp/ POPL https://dblp.uni-trier.de/db/conf/popl/ PLDI https://dblp.uni-trier.de/db/conf/pldi/...

GPCE 2017 papers [...] Case study Language support Application domain Software engineering Time Customization Typed data access HSpiral Search CT CaVa Multi-lingual & distributed BT Guix RT PAX Synthesis BT QSR MetaML Library optimization CT Formal semantics RT Haskino Debugging BT Silverchain SpiralS Usability of APIs High performance BT BT Control Type soindness RT Tensor Reusability BT

[HSpiral] A Haskell Compiler for Signal Transforms (GPCE'17) Domain: signal transforms (FFT) Meta: Haskell Object: Haskell, C, deep embedded DSLs Concepts: type classes, monads, GADTs, type families, index types, rewriting, compilation (to C), deep embedding, quasi-quotation, search (for optimization)

[CaVa] Automatic Generation of Virtual Learning Spaces Driven by CaVaDSL: An Experience Report (GPCE'17) Domain: virtual learning spaces for cultural heritage Meta: Java Object: scripts, SPARQL, HTML, CSS, PHP,... Concepts: external DSL, multi-lingual and distributed target

[Guix] Code Staging in GNU Guix (GPCE'17) Domain: OS configuration and system administration Meta: Scheme Object: embedded DSLs for package definitions and build actions Concepts: staging, macros, S-expressions

[PAX] Parser Generation by Example for Legacy Pattern Languages (GPCE'17) Domain: parsing for legacy languages Meta: C# (for metamodel inference and C# generation) Object: metamodels (for patterns), C# Concepts: pattern synthesis (grammar inference), finite state machines

[QSR] Quoted Staged Rewriting: A Practical Approach to Library-De ned Optimizations (GPCE'17) Domain: stream fusion Meta: Scala Object: Scala Concepts: staging, quasi-quotation, rewriting, macros, CPS, inlining

[MetaML] Refining Semantics for Multi-stage Programming (GPCE'17) Domain: None (run-time code generation by staging) Meta: MetaML Object: ML Concepts: explicit substitutions, structural operational semantics

[Haskino] Rewriting a Shallow DSL using a GHC Compiler Extension (GPCE'17) Domain: embedded systems / microcontrollers Meta: Haskell Object: C Concepts: shallow and deep embedding, debugging, firmware interpretation, translation, monads, shallow-todeep transformation, compiler plugin

[Silverchain] Silverchain: A Fluent API Generator (GPCE'17) Domain: fluent API generation Meta: not defined (Java?) Object: grammars (input), class definitions (output) Concepts: fluent APIs, deterministic push-down automata, BNF

[SpiralS] Staging for Generic Programming in Space and Time (GPCE'17) Domain: convolution on images and FFT Meta: Scala Object: Java Concepts: lightweight modular staging (LMS), polymorphism, generic programming

[Control] Staging with Control: Type-Safe Multi-stage Programming with Control Operators (GPCE'17) Domain: None (run-time code generation by staging) Meta: None (a calculus in MetaML style) Object: None Concepts: staging, control operators (shift0, reset0), computational effects, type soundness, type inference

[Tensor] Towards Compositional and Generative Tensor Optimizations (GPCE'17) Domain: Tensor computations (quantum chemistry and physics, big data analysis, machine learning, computational fluid dynamics) Meta: Undefined Object: C, custom intermediate language Concepts: intermediate language

Thanks!