Combining Static and Dynamic Typing in Ruby: Two Approaches. Jeff Foster University of Maryland, College Park

Similar documents
Adding Static Typing to Ruby

Safe Programming in Dynamic Languages Jeff Foster University of Maryland, College Park

Semantic Analysis. Lecture 9. February 7, 2018

CS 5142 Scripting Languages

Profile-Guided Static Typing for Dynamic Scripting Languages

ABSTRACT. are typically complex, poorly specified, and include features, such as eval

Profile-Guided Static Typing for Dynamic Scripting Languages

CSE 413 Languages & Implementation. Hal Perkins Winter 2019 Structs, Implementing Languages (credits: Dan Grossman, CSE 341)

! Broaden your language horizons. ! Study how languages are implemented. ! Study how languages are described / specified

CSE 341, Autumn 2015, Ruby Introduction Summary

Ruby: Introduction, Basics

CSE341: Programming Languages Lecture 19 Introduction to Ruby and OOP. Dan Grossman Winter 2013

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler Front-End

Static Type Inference for Ruby

! Broaden your language horizons! Different programming languages! Different language features and tradeoffs. ! Study how languages are implemented

Language and Library API Design for Usability of Ruby

CA Compiler Construction

Ruby logistics. CSE341: Programming Languages Lecture 19 Introduction to Ruby and OOP. Ruby: Not our focus. Ruby: Our focus. A note on the homework

CS558 Programming Languages

The role of semantic analysis in a compiler

Ruby: Introduction, Basics

Big Bang. Designing a Statically Typed Scripting Language. Pottayil Harisanker Menon, Zachary Palmer, Scott F. Smith, Alexander Rozenshteyn

Types. Type checking. Why Do We Need Type Systems? Types and Operations. What is a type? Consensus

Compilers. Type checking. Yannis Smaragdakis, U. Athens (original slides by Sam

Intro. Scheme Basics. scm> 5 5. scm>

CSCI-GA Scripting Languages

Type Checking in COOL (II) Lecture 10

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages. OCaml Imperative Programming

Semantic Analysis and Type Checking

Expressions and Assignment

Introduction CMSC 330: Organization of Programming Languages. Books on Ruby. Applications of Scripting Languages

Topics Covered Thus Far. CMSC 330: Organization of Programming Languages. Language Features Covered Thus Far. Programming Languages Revisited

Dialects of ML. CMSC 330: Organization of Programming Languages. Dialects of ML (cont.) Features of ML. Functional Languages. Features of ML (cont.

Dynamic Languages Strike Back. Steve Yegge Stanford EE Dept Computer Systems Colloquium May 7, 2008

why you should use Ruby

Your First Ruby Script

Static Analysis of Dynamically Typed Languages made Easy

6.001 Notes: Section 15.1

Project 1: Scheme Pretty-Printer

Object-oriented programming. and data-structures CS/ENGRD 2110 SUMMER 2018

Dynamically-typed Languages. David Miller

Principles of Programming Languages

CMSC 330 Exam #2 Fall 2008

COMP 181. Agenda. Midterm topics. Today: type checking. Purpose of types. Type errors. Type checking

CS101 Introduction to Programming Languages and Compilers

5. Semantic Analysis!

Dynamic Inference of Static Types for Ruby

Outline. Java Models for variables Types and type checking, type safety Interpretation vs. compilation. Reasoning about code. CSCI 2600 Spring

Topic 9: Type Checking

Topic 9: Type Checking

Installation Download and installation instructions can be found at

Notes from a Short Introductory Lecture on Scala (Based on Programming in Scala, 2nd Ed.)

PyPy - How to not write Virtual Machines for Dynamic Languages

Lecture 7: Type Systems and Symbol Tables. CS 540 George Mason University

1 Lexical Considerations

Ruby. Mooly Sagiv. Most slides taken from Dan Grossman

Operational Semantics. One-Slide Summary. Lecture Outline

Intermediate Code Generation

Topics Covered Thus Far CMSC 330: Organization of Programming Languages

CSE450. Translation of Programming Languages. Lecture 11: Semantic Analysis: Types & Type Checking

CS558 Programming Languages Winter 2013 Lecture 8

Programming Languages, Summary CSC419; Odelia Schwartz

Dynamic Inference of Static Types for Ruby

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler so far

Object-Oriented Design Lecture 14 CSU 370 Fall 2007 (Pucella) Friday, Nov 2, 2007

CSE 431S Final Review. Washington University Spring 2013

UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division. P. N. Hilfinger

Binghamton University. CS-140 Fall Dynamic Types

The Compiler So Far. CSC 4181 Compiler Construction. Semantic Analysis. Beyond Syntax. Goals of a Semantic Analyzer.

Types and Type Inference

Type Checking. Outline. General properties of type systems. Types in programming languages. Notation for type rules.

Outline. General properties of type systems. Types in programming languages. Notation for type rules. Common type rules. Logical rules of inference

Ruby: Introduction, Basics

2.2 Syntax Definition

The Hack programming language:

UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division. P. N. Hilfinger

Programming Paradigms

CMSC 330: Organization of Programming Languages. Formal Semantics of a Prog. Lang. Specifying Syntax, Semantics

SYMBOLS. you would not assign a value to a symbol: :name = "Fred" The Book of Ruby 2011 by Huw Collingbourne

Project 5 Due 11:59:59pm Tue, Dec 11, 2012

What is a compiler? var a var b mov 3 a mov 4 r1 cmpi a r1 jge l_e mov 2 b jmp l_d l_e: mov 3 b l_d: ;done

Compilers and computer architecture: Semantic analysis

CSE 413 Spring Introduction to Ruby. Credit: Dan Grossman, CSE341

Ruby on Rails TKK, Otto Hilska

Software Development Time. Compilers in Real Life. Lessons from Real Life. Software Maintenance. Solutions for Real Life. Celebrity Endorsements

Project 6 Due 11:59:59pm Thu, Dec 10, 2015

Project 4 Due 11:59:59pm Thu, May 10, 2012

Project 5 Due 11:59:59pm Wed, July 16, 2014

If Statements, For Loops, Functions

Implementation of F# language support in JetBrains Rider IDE

Computer Components. Software{ User Programs. Operating System. Hardware

Writing Evaluators MIF08. Laure Gonnord

CMSC 330: Organization of Programming Languages. OCaml Expressions and Functions

Chapter 5 Methods. public class FirstMethod { public static void main(string[] args) { double x= -2.0, y; for (int i = 1; i <= 5; i++ ) { y = f( x );

In Our Last Exciting Episode

Why are there so many programming languages? Why do we have programming languages? What is a language for? What makes a language successful?

CMSC 330: Organization of Programming Languages. OCaml Imperative Programming

Tail Calls. CMSC 330: Organization of Programming Languages. Tail Recursion. Tail Recursion (cont d) Names and Binding. Tail Recursion (cont d)

CS2900 Introductory Programming with Python and C++ Kevin Squire LtCol Joel Young Fall 2007

Transcription:

Combining Static and Dynamic Typing in Ruby: Two Approaches Jeff Foster University of Maryland, College Park

Introduction Scripting languages are extremely popular Lang Rating Lang Rating 1 C 17% 8 *Python 3.2% 2 Java 16.2% 9 *JavaScript 2% 3 C++ 8.7% 10 Transact-SQL 2% 4 Objective-C 8.6% 11 * VB.NET 1.8% 5 *PHP 6.4% 12 *Perl 1.7% 6 C# 5.6% 13 *Ruby 1.4% 7 *Visual Basic 4.8% 14 Object Pascal 0.9% *Scripting language TIOBE Index, September 2013 (based on search hits) Scripting languages are great for rapid development Time from opening editor to successful run of the program is small Rich libraries, flexible syntax, domain-specific support (e.g., regexps, syscalls) 2

Dynamic Typing Most scripting languages have dynamic typing def foo(x) y = x + 3;... # no decls of x or y Benefits Programs are shorter Java class A { public static void main(string[] args) { System.out.println( Hello, world! ); } } No type errors unless program about to go wrong Possible coding patterns very flexible Seems good for rapid development Ruby puts Hello, world! 3

Drawbacks Errors remain latent until run time No static types to serve as (rigorously checked) documentation Code evolution and maintenance may be harder E.g., no static type system to ensure refactorings are type correct 4

Diamondback Ruby

Diamondback Ruby (DRuby) Research goal: Develop a type system for scripting langs. Simple for programmers to use Flexible enough to handle common idioms Provides useful checking where desired Reverts to run time checks where needed DRuby: Adding static types to Ruby Ruby becoming popular, especially for building web apps A model scripting language - Based on Smalltalk, and mostly makes sense internally 6

First Part of This Talk RIL: The Ruby Intermediate Language Small, easy to analyze subset of Ruby Static type inference for Ruby Type system is rich enough to handle many common idioms Profile-based analysis for highly dynamic features Reflection, eval, method_missing, etc Joint work with Mike Furr, David An, Mike Hicks, Mark Daly, Avik Chaudhuri, and Ben Kirzhner 7

On The Design of Ruby In usual language design, there are several good properties: consistency simplicity orthogonality flexibility succinctness intuitiveness DRY (Don t Repeat Yourself) good names generalness naturalness meets programmers common sense In the design policy of Ruby, they are also good properties. Akira Tanaka, Language and Library API Design for Usability of Ruby, PLATEAU 2009 8

On The Design of Ruby (cont d) However, sometimes Ruby overrides the properties for usability. Ruby [doesn t] need consistency including rare usage. Ruby [doesn t] need succinctness including rare usage. Ruby [doesn t] need orthogonality including rare usage. Ruby [doesn t] need simplicity including rare usage. 9

http://www.flickr.com/photos/nnecapa/2868248691/in/set-72157607404941040

Ruby Intermediate Language (RIL) A front-end for Ruby code analysis and transformation Key features GLR parser for Ruby source code Compact, simplified intermediate representation Pretty-printer that outputs valid, executable Ruby code Partial reparsing module to make code transformation easier Dataflow analysis engine Support for run-time profiling 11

Parsing Ruby Source [Ruby should] feel natural to programmers Yukihiro Matsumoto Result: Grammar not amenable to LL/LR parsing Ruby s own parser is complex, written in C, tied to interpreter Solution: A GLR parser for Ruby Grammar productions may be ambiguous Ambiguities resolved eventually to yield one final parse tree 12

Intermediate Representation Ruby has many ways to do the same thing if p then e / e if p / unless (not p) e / e unless (not p) Control flow in Ruby can be complex In w = x().y(z()) does x() or z() occur first? Need to know this to build flow-sensitive analyses Ruby has some weird behavior x = a # error if a undefined if false then a = 3 end; x = a; # sets x to nil (!) RIL: Simplifies this all away 24 stmt kinds, each with only one side effect, organized as CFG Much easier to analyze than unsimplified Ruby 13

Static Types for Ruby How do we build a static type system that accepts reasonable Ruby programs? What idioms do Ruby programmers use? Are Ruby programs even close to statically type safe? Goal: Keep the type system as simple as possible Should be easy for programmer to understand Should be predictable We ll illustrate our typing discipline on the core Ruby standard library 14

The Ruby Standard Library Ruby comes with a bunch of useful classes Fixnum (integers), String, Array, etc. However, these are implemented in C, not Ruby Type inference for Ruby isn t going to help! Our approach: type annotations We will ultimately want these for regular code as well Standard annotation file base_types.rb 185 classes, 17 modules, and 997 lines of type annotations 15

Basic Annotations Type annotation Block (higher-order method) type class String ##% "+" : (String) String ##% insert : (Fixnum, String) String ##% upto : (String) {String Object} String... end 16

Intersection Types class String include? : Fixnum Boolean include? : String Boolean end Meth is both Fixnum Boolean and String Boolean Ex: foo.include?( f ); foo.include?(42); Generally, if x has type A and B, then x is both an A and a B, i.e., x is a subtype of A and of B and thus x has both A s methods and B s methods 17

Intersection Types (cont d) class String slice : (Fixnum) Fixnum slice : (Range) String slice : (Regexp) String slice : (String) String slice : (Fixnum, Fixnum) String slice : (Regexp, Fixnum) String end Intersection types are common in the standard library 74 methods in base_types.rb use them Our types look much like the RDoc descriptions of methods Except we type check the uses of functions We found several places where the RDoc types are wrong (Note: We treat nil as having any type) 18

Optional Arguments class String chomp : () String chomp : (String) String end Ex: foo.chomp( o ); foo.chomp(); By default, chomps $/ Abbreviation: class String chomp : (?String) String end 0 or 1 occurrence 19

Aside: $ in Ruby Global variables begin with $ Here are all the special global variables formed from non-ascii names $! $@ $; $, $/ $\ $. $_ $< $> $$ $? $~ $= $* $` $ $+ $& $0 $: $ $1 $2 $3 $4 $5 $6 $7 $8 $9 (these are local) 20

Variable-length Arguments class String delete : (String, *String) String end Ex: foo.delete( a ); foo.delete( a, b, c ); *arg is equivalent to an unbounded intersection To be sensible Required arguments go first Then optional arguments Then one varargs argument 0 or more occurrences 21

Union Types class A def f() end end class B def f() end end x = (if... then A.new else B.new end) x.f This method invocation is always safe Note: in Java, would make interface I s.t. A < I, B < I Here x has type A or B It s either an A or a B, and we re not sure which one Therefore can only invoke x.m if m is common to both A and B Ex: Boolean short for TrueClass or FalseClass 22

Structural Subtyping Types so far have all been nominal Refer directly to class names Mostly because core standard library is magic - Looks inside of Fixnum, String, etc objects for their contents But Ruby really uses structural or duck typing Basic Ruby op: method dispatch e0.m(e1,..., en) - Look up m in e0, or in classes/modules e0 inherits from - If m has n arguments, invoke m; otherwise raise error Most Ruby code therefore only needs objects with particular methods, rather than objects of a particular class 23

Object Types module Kernel print : (*[to_s : () String]) NilClass end print accepts 0 or more objects with a to_s method Object types are especially useful for native Ruby code: - def f(x) y = x.foo; z = x.bar; end What is the most precise type for f s x argument? - C1 or C2 or... where Ci has foo and bar methods - Bad: closed-world assumption; inflexible; probably does not match programmer s intention - Fully precise object type: [foo:()..., bar:()...] 24

Tuple Types def f() [ 1, true ] end a, b = f # a = 1, b = true f : () Array<Fixnum or Boolean>? Not precise enough to type above example f : () Tuple<Fixnum, Boolean> Tuple<t1,..., tn> = array where elt i has type ti Implicit subtyping between Tuple and Array Tuple<t1,..., tn> < Array<t1 or... or tn> 25

That s the Basic Type System Optional and varargs Intersection and union types Object types Tuple types (Plus the self type, parametric polymorphism (generics), types for mixins, first-class method types, flow-sensitivity for local variables) A fair amount of machinery, but not too bad! 26

Dynamic Features The basic type system works well at the application level Some experimental results coming up shortly But starts to break down if we analyze big libraries Libraries include some interesting dynamic features Typical Ruby program = small app + large libraries 27

Eval, in General class Format ATTRS = [ bold, underscore,...] ATTRS.each do attr code = def #{attr}()... end eval code end end 28

Real-World Eval Example class Format ATTRS = [ bold, underscore,...] ATTRS.each do attr code = def #{attr}()... end eval code end end class Format def bold()... end def underline() end end 29

Real-World Eval Example eval occurs at top level class Format ATTRS = [ bold, underscore,...] ATTRS.each do attr code = def #{attr}()... end eval code end end code can be arbitrarily complex But, will always add the same methods Morally, this code is static, rather than dynamic Idea: execute the code and see what eval does Augment static analysis with this information 30

Profile-Guided Static Analysis eval d strings inserted as source code class Format ATTRS = [ bold, underscore,...] ATTRS.each do attr code = def #{attr}()... end if code = def bold()... end def bold()... end else if code = def underscore()... def underscore()... end else safe_eval code end end 31

Profile-Guided Static Analysis else case adds extra dynamic checks class Format ATTRS = [ bold, underscore,...] ATTRS.each do attr code = def #{attr}()... end if code = def bold()... end def bold()... end else if code = def underscore()... def underscore()... end else safe_eval code end end Checks ensure that any runtime type error blames a string passed to safe_eval 32

Theory of Profiling System Theorem: Translation is faithful Static analysis is seeing a correct projection of the actual runtime behavior Theorem: Translation + static typing is sound Program either executes without getting stuck, or can blame string that wasn t seen before 33

Profiling Effectiveness Analyzed 13 benchmarks, including std-lib code 24,895 LOC in total Found 66 uses of dynamic features Inspected each dynamic feature and categorized how dynamic the usage really was Found all uses could be divided into a few categories All of which are morally static 34

Performance Benchmark Total LoC Time (s) ai4r-1.0 21,589 343 bacon-1.0.0 19,804 335 hashslice-1.0.4 20,694 307 hyde-0.0.4 21,012 345 isi-1.1.4 22,298 373 itcf-1.0.0 23,857 311 memoize-1.2.3 4,171 9 pit-0.0.6 24,345 340 sendq-0.0.1 20,913 320 StreetAddress-1.0.1 24,554 309 sudokusolver-1.4 21,027 388 text-highlight-1.0.2 2,039 2 use-1.2.1 20,796 323 LoC = application Figure 8. + Type libraries inferenceit results uses Type inference problem actually quite complicated These times are reasonable for whole-program analysis 35

Example Errors Found Typos in names Archive::Tar::ClosedStream instead of Archive::Tar::MiniTar::ClosedStream Policy instead of Policies Other standard type errors return rule_not_found if!@values.include?(value) rule_not_found not in scope Program did include a test suite, but this path not taken 36

Example Errors Found (cont d) <% @any_more = Post.find(:first, :offset => (@offset.to_i + @posts_per_page.to_i) + 1, :limit => 1 ) %> Model Post does not exist in the Rails app class Integer def to_bn OpenSSL::BN.new(self) end end BN.new expects String, not Integer 3.to_bn would cause a type error 37

Syntactic Confusion assert_nothing_raised { @hash[ a, b ] = 3, 4 }... assert_kind_of(fixnum, @hash[ a, b ] = 3, 4) First passes [3,4] to the []= method of @hash Second passes 3 to the []= method, passes 4 as last argument of assert_kind_of - Even worse, this error is suppressed at run time due to an undocumented coercion in assert_kind_of 38

Syntactic Confusion (cont d) flash[:notice] = You do not have... +... Programmer intended to concatenate two strings But here the + is parsed as a unary operator whose result is discarded @count, @next, @last = 1 Intention was to assign 1 to all three fields But this actually assigns 1 to @count, and nil to @next and @last 39

DRuby Was Promising, But... Ruby front-end somewhat fragile Tied to Ruby 1.8.7; required fixing for Ruby 1.9, would need to fix again for Ruby 2.0 Phased analysis doesn t match that well with many uses of Ruby Ruby has no compile phase, so somewhat unnatural Limitations of type system require adding a lot more sophistication and complexity Complex type system = hard for programmer to predict 40