Growing Solver-Aided Languages with ROSETTE Emina Torlak & Rastislav Bodik U.C. Berkeley ExCAPE June 10, 2013
solver-aided domain-specific language Solver-aided DSL (SDSL) Noun 1. A high-level language in which partially implemented programs can be executed, verified, debugged and synthesized with the aid of a constraint solver. 2
programming specification assume pre(x) P(x) { } assert post(p(x)) 3
programming formula, input/ output pairs, traces, another program, assume pre(x) P(x) { } assert post(p(x)) 3
programming assume pre(x) assert post(p(x)) assume pre(x) P(x) { } assert post(p(x)) 3
programming with a solver-aided language? assume pre(x) P(x) { } assert post(p(x)) translate( ) SAT/SMT solver 4
solver-aided programming: code checking Is there a valid input x for which P(x) violates the spec? assume pre(x) P(x) { } assert post(p(x)) x. pre(x) post(p(x)) SAT/SMT solver CBMC [Oxford], Dafny [MSR], Jahob [EPFL], Miniatur / MemSAT [IBM], etc. 5
solver-aided programming: code checking Is there a valid input x for which P(x) violates the spec? assume pre(x) P(x) { } assert post(p(x)) x = 42 x. pre(x) post(p(x)) SAT/SMT solver counterexample model CBMC [Oxford], Dafny [MSR], Jahob [EPFL], Miniatur / MemSAT [IBM], etc. 5
solver-aided programming: localizing faults Given x and x, what subset of P is responsible for P(x) x? assume pre(x) P(x) { v = x + 2 } assert post(p(x)) pre(x) post(x ) x = P(x) SAT/SMT solver BugAssist [UCLA / MPI-SWS] 6
solver-aided programming: localizing faults Given x and x, what subset of P is responsible for P(x) x? assume pre(x) P(x) { v = x + 2 } assert post(p(x)) pre(x) post(x ) x = P(x) SAT/SMT solver repair candidates MAXSAT/ MIN CORE BugAssist [UCLA / MPI-SWS] 6
solver-aided programming: angelic execution Given x, choose v at runtime so that P(x, v) satisfies the spec. assume pre(x) P(x) { v = choose() } assert post(p(x)) v. pre(x) post(p(x, v)) SAT/SMT solver Kaplan [EPFL], PBnJ [UCLA], Skalch [Berkeley], Squander [MIT], etc. 7
solver-aided programming: angelic execution Given x, choose v at runtime so that P(x, v) satisfies the spec. assume pre(x) P(x) { v = choose() } assert post(p(x)) v = 0, v. pre(x) post(p(x, v)) SAT/SMT solver trace model Kaplan [EPFL], PBnJ [UCLA], Skalch [Berkeley], Squander [MIT], etc. 7
solver-aided programming: synthesis Replace?? with expression e so that Pe(x) satisfies the spec on all valid inputs. assume pre(x) P(x) { v =?? } assert post(p(x)) e. x. pre(x) post(pe(x)) SAT/SMT solver Comfusy [EPFL], Sketch [Berkeley / MIT] 8
solver-aided programming: synthesis Replace?? with expression e so that Pe(x) satisfies the spec on all valid inputs. assume pre(x) P(x) { v =?? x 2 } assert post(p(x)) e. x. pre(x) post(pe(x)) SAT/SMT solver expressions model Comfusy [EPFL], Sketch [Berkeley / MIT] 8
but building solver-aided languages is hard P(x) { }??? (Q x ) R(x) : Each new SDSL created by careful custom compilation to constraints, requiring years of training and experience. translate( ) translate( ) translate( ) SAT/SMT solver 9
a solver-aided framework for building SDSLs P(x) { }??? (Q x ) interpret( ) API( ) interpret( ) R(x) : ROSETTE Implement a library or an interpreter for your SDSL, and get a synthesizer, verifier, debugger and angelic oracle for programs in that SDSL. 10
a tiny solver-aided extension of racket top-level-form = general-top-level-form (#%expression expr) (module id name-id (#%plain-module-begin module-level-form...)) (begin top-level-form...) (begin-for-syntax top-level-form...) module-level-form = general-top-level-form (#%provide raw-provide-spec...) (begin-for-syntax module-level-form...) general-top-level-form = expr (define-values (id...) expr) (define-syntaxes (id...) expr) (#%require raw-require-spec...) Racket expr = id (#%plain-lambda formals expr...+) (case-lambda (formals expr...+)...) (if expr expr expr) (begin expr...+) (begin0 expr expr...) (let-values ([(id...) expr]...) expr...+) (letrec-values ([(id...) expr]...) expr...+) (set! id expr) (quote datum) (quote-syntax datum) (with-continuation-mark expr expr expr) (#%plain-app expr...+) (#%top. id) (#%variable-reference id) (#%variable-reference (#%top. id)) (#%variable-reference) (define-symbolic id expr) (assert expr) (solve expr) (verify expr) (debug [expr] expr) (synthesize #:forall expr #:guarantee expr) ROSETTE formals = (id...) (id...+. id) id 11
with a symbolic evaluator and compiler debug solve transform, evaluate & compile to constraints verify SDSL + program synthesize ROSETTE racket KODKOD SAT solver 12
with a symbolic evaluator and compiler debug solve transform, evaluate & compile to constraints verify SDSL + program synthesize ROSETTE racket Easy to connect to another solver or backend, such as SMT or SynthLib. KODKOD SAT solver 12
with a symbolic evaluator and compiler debug solve map solution to program level transform, evaluate & compile to constraints verify SDSL + program synthesize ROSETTE racket Easy to connect to another solver or backend, such as SMT or SynthLib. KODKOD SAT solver 12
with a symbolic evaluator and compiler debug solve map solution to program level transform, evaluate & compile to constraints verify SDSL + program synthesize ROSETTE racket Easy to connect to another solver or backend, such as SMT or SynthLib. KODKOD SAT solver 12
web, spatial programming, superoptimization 13
websynth: web scraping by demonstration Web scraping usually done with fragile, hand-written scripts using regex. Websynth synthesizes a web scraper from a URL and a few examples of desired data, using a declarative SDLS (XPaths). Implemented by two undergraduate students in a few weeks. 4 seconds in total to construct a 1100+ node DOM, synthesize the XPath, and return the data. 14
partitioning code & data for a low power chip DB003 Evaluation Board Reference for EVB001 Instructions/Second vs Power ~100x Figure by Per Ljung GreenArrays GA144 Processor 2. Basic Architecture The purpose of this board is to facilitate evaluation and application prototyping using GreenArrays chips. Because no single I/O complement would be suitable for all likely uses, this board has two GA144 chips: One (called "Host") configured with sufficient I/O for intensive software development, and the other (called "Target") with as little I/O committed as possible so that pure, dedicated applications may be prototyped. 2.1 Highlights Three FTDI USB to serial chips provide high speed (960 kbaud) communications for interactive software development and general-purpose host communications. An onboard switching regulator takes power from the USB connectors and/or a conventional "wall wart" power supply. Whichever of these is offering the highest voltage is used by the regulator. A barrier strip provides for connection of bench power supplies. Each of the power buses of the two GA144 chips may selectively be run from external power in lieu of the onboard regulator, allowing you to run either chip from any desired VDD voltage and also facilitating current measurements. The Host chip is supplied with an SPI boot flash holding 1 MByte of nonvolatile data, an external SRAM with 1 MWord (2 MBytes) of memory; and may optionally use a dual voltage MMC card such as the 2 Gigabyte unit we have selected for in-house use. These memory resources may be used in conjunction with Virtual Machines such as eforth and polyforth, or for direct use by your own F18 code. The Target chip is committed to as few I/O connections as possible. The sources for its reset signal are fully configurable, and with the exception of a SERDES line connecting it with the Host chip, all other communications (two 2-wire serial interfaces) may be disconnected so that the chip is fully isolated and thus all practical I/O is available for any desired use. Roughly half the board is prototyping area, mainly populated with a grid of plated through holes on 0.1 inch centers. By soldering suitable headers to this grid, you can provide for expansion using various prototyping fixtures such as 15
partitioning code & data for a low power chip DB003 Evaluation Board Reference for EVB001 Instructions/Second vs Power ~100x GreenArrays GA144 Processor 2. Basic Architecture The purpose of this board is to facilitate evaluation and application prototyping using GreenArrays chips. Because no single I/O complement would be suitable for all likely uses, this board has two GA144 chips: One (called "Host") configured with sufficient I/O for intensive software development, and the other (called "Target") with as little I/O committed as possible so that pure, dedicated applications may be prototyped. Figure by Per Ljung Stack-based 18-bit architecture 32 instructions 8 x 18 array of asynchronous cores No shared resources (cache, memory) Limited communication, neighbors only < 300 byte memory per core 2.1 Highlights Three FTDI USB to serial chips provide high speed (960 kbaud) communications for interactive software development and general-purpose host communications. An onboard switching regulator takes power from the USB connectors and/or a conventional "wall wart" power supply. Whichever of these is offering the highest voltage is used by the regulator. A barrier strip provides for connection of bench power supplies. Each of the power buses of the two GA144 chips may selectively be run from external power in lieu of the onboard regulator, allowing you to run either chip from any desired VDD voltage and also facilitating current measurements. The Host chip is supplied with an SPI boot flash holding 1 MByte of nonvolatile data, an external SRAM with 1 MWord (2 MBytes) of memory; and may optionally use a dual voltage MMC card such as the 2 Gigabyte unit we have selected for in-house use. These memory resources may be used in conjunction with Virtual Machines such as eforth and polyforth, or for direct use by your own F18 code. The Target chip is committed to as few I/O connections as possible. The sources for its reset signal are fully configurable, and with the exception of a SERDES line connecting it with the Host chip, all other communications (two 2-wire serial interfaces) may be disconnected so that the chip is fully isolated and thus all practical I/O is available for any desired use. Roughly half the board is prototyping area, mainly populated with a grid of plated through holes on 0.1 inch centers. By soldering suitable headers to this grid, you can provide for expansion using various prototyping fixtures such as 15
partitioning code & data for a low power chip DB003 Evaluation Board Reference for EVB001 Instructions/Second vs Power ~100x GreenArrays GA144 Processor 2. Basic Architecture The purpose of this board is to facilitate evaluation and application prototyping using GreenArrays chips. Because no single I/O complement would be suitable for all likely uses, this board has two GA144 chips: One (called "Host") configured with sufficient I/O for intensive software development, and the other (called "Target") with as little I/O committed as possible so that pure, dedicated applications may be prototyped. Figure by Per Ljung Stack-based 18-bit architecture 32 instructions 8 x 18 array of asynchronous cores No shared resources (cache, memory) Limited communication, neighbors only < 300 byte memory per core 2.1 Highlights Three FTDI USB to serial chips provide high speed (960 kbaud) communications for interactive software development and general-purpose host communications. Whichever of these is offering the highest voltage is used by the regulator. Manual function partitioning: barrier strip provides for connection of bench power supplies. Each of the power buses of the two GA144 chips may break functions up into aaselectively pipeline be run from external power in lieu of the onboard regulator, allowing you to run either chip from any desired V voltage and also facilitating current measurements. with a few operations perthe Host core. chip is supplied with an SPI boot flash holding 1 MByte of nonvolatile data, an external SRAM with 1 MWord An onboard switching regulator takes power from the USB connectors and/or a conventional "wall wart" power supply. DD (2 MBytes) of memory; and may optionally use a dual voltage MMC card such as the 2 Gigabyte unit we have selected for in-house use. These memory resources may be used in conjunction with Virtual Machines such as eforth and polyforth, or for direct use by your own F18 code. The Target chip is committed to as few I/O connections as possible. The sources for its reset signal are fully configurable, and with the exception of a SERDES connectingphothilimthana it with the Host chip, all other communications (two Drawing bylinemangpo 2-wire serial interfaces) may be disconnected so that the chip is fully isolated and thus all practical I/O is available for any desired use. Roughly half the board is prototyping area, mainly populated with a grid of plated through holes on 0.1 inch centers. By soldering suitable headers to this grid, you can provide for expansion using various prototyping fixtures such as 15
partitioning code & data for a low power chip High-Level Program New Programming Model Per-core High-Level Programs Per-core Optimized Machine Code Partitioner Code Generator New Approach Using Synthesis Mangpo Phothilimthana and Nishant Totla (first year graduate students) 16
partitioning code & data for a low power chip High-Level Program New Programming Model Per-core High-Level Programs Per-core Optimized Machine Code Partitioner Code Generator New Approach Using Synthesis Mangpo Phothilimthana and Nishant Totla (first year graduate students) 16
partitioning code & data for a low power chip one iteration of MD5 hash optimal code partitioning by the synthesizer 256-byte mem per core initial data placement specified R i high F 102 103 104 R M 105 <<< 106 K low 2 3 M F 4 5 <<< 6 K F 202 <<< 102 R 512-byte mem per core different initial data placement 103 M 106 K high low <<< 102 103 104 R 2 M 3 M 512-byte mem per core same initial data placement F F 4 105 5 106 K 6 K 17
component-based synthesis of loop free code A clever algorithm for synthesis of bitvector programs by Gulwani et al, PLDI 11. Produces a loop-free program from a multi-set of instructions and a functional spec (superoptimization). Encoding cannot be reproduced with tools using hardwired, general-purpose synthesis algorithms. 600 lines of Rosette, and performance matches the published algorithm. 18
thank you! ROSETTE code checking angelic execution debugging synthesis 19