Running R Faster. Tomas Kalibera

Similar documents
Optimizing R VM: Interpreter-level Specialization and Vectorization

code://rubinius/technical

Optimization Techniques

R FUTURE DIRECTIONS. Speaker. FastR (speedup 8.5x GNURast) What have we done so far. BRIXEN June Jan Vitek. Professor of Computer Science

Contents in Detail. Who This Book Is For... xx Using Ruby to Test Itself... xx Which Implementation of Ruby?... xxi Overview...

Parsing Scheme (+ (* 2 3) 1) * 1

ASSIGNMENT OF MASTER S THESIS

CS 360 Programming Languages Interpreters

CSc 453 Interpreters & Interpretation

Programming Languages Third Edition. Chapter 10 Control II Procedures and Environments

Playing with bird guts. Jonathan Worthington YAPC::EU::2007

Agenda. CSE P 501 Compilers. Java Implementation Overview. JVM Architecture. JVM Runtime Data Areas (1) JVM Data Types. CSE P 501 Su04 T-1

Simple Dynamic Compilation with GOO. Jonathan Bachrach. MIT AI Lab 01MAR02 GOO 1

ALTREP and Other Things

Advances in Memory Management and Symbol Lookup in pqr

CSE P 501 Compilers. Java Implementation JVMs, JITs &c Hal Perkins Winter /11/ Hal Perkins & UW CSE V-1

Design Issues. Subroutines and Control Abstraction. Subroutines and Control Abstraction. CSC 4101: Programming Languages 1. Textbook, Chapter 8

The Java Programming Language

Managed runtimes & garbage collection

Managed runtimes & garbage collection. CSE 6341 Some slides by Kathryn McKinley

Some possible directions for the R engine

Jython Python for the Java Platform. Jim Baker, Committer twitter.com/jimbaker zyasoft.com/pythoneering

Python Implementation Strategies. Jeremy Hylton Python / Google

Java Internals. Frank Yellin Tim Lindholm JavaSoft

Sista: Improving Cog s JIT performance. Clément Béra

C++ for System Developers with Design Pattern

CS 330 Lecture 18. Symbol table. C scope rules. Declarations. Chapter 5 Louden Outline

Run-time Program Management. Hwansoo Han

Towards Lean 4: Sebastian Ullrich 1, Leonardo de Moura 2.

StackVsHeap SPL/2010 SPL/20

A Fast Abstract Syntax Tree Interpreter for R

Dispatch techniques and closure representations

RubyConf 2005 Oct. 14 SASADA Koichi Tokyo University of Agriculture and Technology Nihon Ruby no Kai

Intermediate Code Generation

Some Improvements of the Byte-code Compiler Problems in Existing R/C Code

Namespaces, Source Code Analysis, and Byte Code Compilation

Vertical Profiling: Understanding the Behavior of Object-Oriented Applications

Native Interfaces for R

Building a Multi-Language Interpreter Engine

CSc 520 Principles of Programming Languages

Short Notes of CS201

CS5460/6460: Operating Systems. Lecture 21: Shared libraries. Anton Burtsev March, 2014

Cog VM Evolution. Clément Béra. Thursday, August 25, 16

CS201 - Introduction to Programming Glossary By

Assigning to a Variable

Delft-Java Link Translation Buffer

G Programming Languages - Fall 2012

Announcements. My office hours are today in Gates 160 from 1PM-3PM. Programming Project 3 checkpoint due tomorrow night at 11:59PM.

WELCOME TO PERL = Tuesday, June 4, 13

CSc 453. Compilers and Systems Software. 18 : Interpreters. Department of Computer Science University of Arizona. Kinds of Interpreters

Principles of Compiler Design

invokedynamic IN 45 MINUTES!!! Wednesday, February 6, 13

Implementing Higher-Level Languages. Quick tour of programming language implementation techniques. From the Java level to the C level.

SABLEJIT: A Retargetable Just-In-Time Compiler for a Portable Virtual Machine p. 1

Object-Oriented Principles and Practice / C++

Heap, Variables, References, and Garbage. CS152. Chris Pollett. Oct. 13, 2008.

HotPy (2) Binary Compatible High Performance VM for Python. Mark Shannon

Object typing and subtypes

Java Primer 1: Types, Classes and Operators

Chapter 9 :: Subroutines and Control Abstraction

Smalltalk Implementation

1.3.4 case and case* macro since 1.2. Listing Conditional Branching, Fast Switch. Listing Contract

EDAN65: Compilers, Lecture 13 Run;me systems for object- oriented languages. Görel Hedin Revised:

S.No Question Blooms Level Course Outcome UNIT I. Programming Language Syntax and semantics

PERL 6 and PARROT An Overview. Presentation for NYSA The New York System Administrators Association

CSc 553. Principles of Compilation. 2 : Interpreters. Department of Computer Science University of Arizona

Interprocedural Type Specialization of JavaScript Programs Without Type Analysis

A Trace-based Java JIT Compiler Retrofitted from a Method-based Compiler

Threads. CS3026 Operating Systems Lecture 06

G Programming Languages Spring 2010 Lecture 4. Robert Grimm, New York University

Concepts Introduced in Chapter 7

Functional Programming. Pure Functional Programming

Separating Access Control Policy, Enforcement, and Functionality in Extensible Systems. Robert Grimm University of Washington

VM instruction formats. Bytecode translator

Chapter 11: Compiler II: Code Generation

The basic operations defined on a symbol table include: free to remove all entries and free the storage of a symbol table

invokedynamic under the hood

Operational Semantics. One-Slide Summary. Lecture Outline

Evolution of Programming Languages

The Java Language Implementation

Java On Steroids: Sun s High-Performance Java Implementation. History

Operating Systems CMPSCI 377, Lec 2 Intro to C/C++ Prashant Shenoy University of Massachusetts Amherst

Under the Hood: The Java Virtual Machine. Lecture 23 CS2110 Fall 2008

<Insert Picture Here> Implementing lambda expressions in Java

COP4020 Programming Languages. Compilers and Interpreters Robert van Engelen & Chris Lacher

Distributed Systems. The main method of distributed object communication is with remote method invocation

CSE 501: Compiler Construction. Course outline. Goals for language implementation. Why study compilers? Models of compilation

MethodHandle implemention tips and tricks

Control Abstraction. Hwansoo Han

Computer Components. Software{ User Programs. Operating System. Hardware

Where do we go from here?

Traditional Smalltalk Playing Well With Others Performance Etoile. Pragmatic Smalltalk. David Chisnall. August 25, 2011

Index. object lifetimes, and ownership, use after change by an alias errors, use after drop errors, BTreeMap, 309

Translating JVM Code to MIPS Code 1 / 43

Object-Oriented Distributed Technology

Atelier Java - J1. Marwan Burelle. EPITA Première Année Cycle Ingénieur.

Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Programming Language Implementation

Programming Languages Third Edition. Chapter 7 Basic Semantics

JavaScript and V8 A functional-ish language and implementation in the mainstream

Transcription:

Running R Faster Tomas Kalibera

My background: computer scientist, R user.

My FastR experience: Implementing a new R VM in Java. New algorithms, optimizations help Frame representation, variable lookup Function calls and argument matching Specialized data types Code specialization Lazy arithmetics with profiling views Implementing a new R VM is hard Specification Tightly coupled packages and the VM VEE'14: A fast abstract syntax tree interpreter for R

My current work: speeding-up GNU-R. With Luke Tierney, Jan Vitek ML benchmarks from TU Dortmund github.com/kalibera/rexp Based on R-dev 65969 (June 18), check-all passes.

github.com/kalibera/rexp Shootout benchmarks

github.com/kalibera/rexp AT&T Benchmarks (Benchmark 25)

Compiler bytecode-optimizations. Inlining constants into bytecode Inlining labels into bytecode function(x) { for(i in 1:10) { x <- x + 1 } x } LDCONST.OP, 1L, STARTFOR.OP, 3L, 2L, 16L, 7: GETVAR.OP, 4L, LDCONST.OP, 5L, ADD.OP, 6L, SETVAR.OP, 4L, POP.OP, 16: STEPFOR.OP, 7L, ENDFOR.OP, POP.OP, GETVAR.OP, 4L, RETURN.OP 1: 1:10, 2: i, 3: for (i in 1:10) { x <- x + 1 }, 4: x, 5: 1, 6: x + 1

Compiler optimizations variable access. Special instruction for creating a promise that just reads a variable Faster variable access for builtins (uses cache) Constant-pool re-ordering Variable are first, which reduces memory overhead of the binding cache and improves locality Frames in R are implemented using linked lists. A binding cache stores, for each constant in the constant pool, a reference to the corresponding element of the linked list.

Stack-allocation of call arguments. (primarily in the compiler) Call arguments passed as linked-list Special stack-based memory region Growable, shrinkable stack for fixed-size call argument cells Special treatment by the GC Support for long-jumps via contexts Better locality, faster reclamation In R, the list of function arguments (promises) passed to a function are kept around for the duration of the function call, because they'll become needed in the case of object dispatch.

Explicit argument passing (no linked lists). For (many) builtins and internals For closures called positionally Lists are only created lazily if needed get(x, envir, mode, inherits) SEXP attribute_hidden do_get(sexp call, SEXP op, SEXP args, SEXP rho) if (!isvalidstringf(car(args))) if (TYPEOF(CADR(args)) == REALSXP) if (isstring(caddr(args))) ginherits = aslogical(cadddr(args)); do_earg_get(sexp call, SEXP op, SEXP arg_x, SEXP arg_envir, SEXP arg_mode, SEXP arg_inherits, SEXP rho)

Inlining wrappers to foreign calls. rnorm <- function (n, mean = 0, sd = 1).External(C_rnorm, n, mean, sd) Inlining avoids overhead of promise creation, argument matching, environment creation Explicit passing of arguments to.call foreign calls (avoiding linked list) Updating external pointer at load time C_rnorm in the example is a variable in the `stats` namespace, which is automatically created when `stats` package is loaded and it points to a registered native symbol (R object). This object contains an external pointer (R structure), which contains a physical pointer to the `rnorm` routine implemented in the C code of the `stats` package.

Object dispatch (S3/S4) optimizations. method.class method#class1#class2#class3 Faster signature creation Avoid name allocation Re-use hashcode of first term method Comparison using == (instead of strcmp) Fast-path optimizations During method dispatch, one needs an R symbol for a signature (S3 or S4). A symbol has to be looked up in a hash table, based on its string name. Strings in R are however also interned (STRSXPs), and remember their hashes.

Summary GNU-R performance for real applications can be improved without changing current semantics Avoiding linked lists for function arguments Optimizing dispatch of stats functions, S3/S4 dispatch Optimizing string operations Smaller clean-ups (symbol, charsxp shortcuts, etc) I'm working with Luke Tierney on merging some of these improvements