Accelerating Ruby with LLVM

Similar documents
Expressing high level optimizations within LLVM. Artur Pilipenko

code://rubinius/technical

Towards Lean 4: Sebastian Ullrich 1, Leonardo de Moura 2.

Announcements. My office hours are today in Gates 160 from 1PM-3PM. Programming Project 3 checkpoint due tomorrow night at 11:59PM.

LLVM for a Managed Language What we've learned

Pegarus & Poison. Rubinius VM as a Multilanguage Platform

BEAMJIT, a Maze of Twisty Little Traces

Where We Are. Lexical Analysis. Syntax Analysis. IR Generation. IR Optimization. Code Generation. Machine Code. Optimization.

BEAMJIT: An LLVM based just-in-time compiler for Erlang. Frej Drejhammar

Agenda. CSE P 501 Compilers. Java Implementation Overview. JVM Architecture. JVM Runtime Data Areas (1) JVM Data Types. CSE P 501 Su04 T-1

SSA Based Mobile Code: Construction and Empirical Evaluation

SABLEJIT: A Retargetable Just-In-Time Compiler for a Portable Virtual Machine p. 1

High-Level Language VMs

A Trace-based Java JIT Compiler Retrofitted from a Method-based Compiler

CSE P 501 Compilers. Java Implementation JVMs, JITs &c Hal Perkins Winter /11/ Hal Perkins & UW CSE V-1

Complex, concurrent software. Precision (no false positives) Find real bugs in real executions

LLVM Summer School, Paris 2017

Traditional Smalltalk Playing Well With Others Performance Etoile. Pragmatic Smalltalk. David Chisnall. August 25, 2011

Truffle A language implementation framework

What is a compiler? Xiaokang Qiu Purdue University. August 21, 2017 ECE 573

Contents in Detail. Who This Book Is For... xx Using Ruby to Test Itself... xx Which Implementation of Ruby?... xxi Overview...

CS153: Compilers Lecture 15: Local Optimization

Lecture 3 Overview of the LLVM Compiler

Intermediate Representations

Faster Programs with Guile 3

Parsing Scheme (+ (* 2 3) 1) * 1

Today. Instance Method Dispatch. Instance Method Dispatch. Instance Method Dispatch 11/29/11. today. last time

Building a Compiler with. JoeQ. Outline of this lecture. Building a compiler: what pieces we need? AKA, how to solve Homework 2

FTL WebKit s LLVM based JIT

MoarVM. A metamodel-focused runtime for NQP and Rakudo. Jonathan Worthington

a translator to convert your AST representation to a TAC intermediate representation; and

HSA Foundation! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar Room (Bld 20)! 15 December, 2017!

ECE 5775 (Fall 17) High-Level Digital Design Automation. Static Single Assignment

CSc 453 Interpreters & Interpretation

WELCOME TO PERL = Tuesday, June 4, 13

Combined Object-Lambda Architectures

Trace-based JIT Compilation

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler Front-End

Python Implementation Strategies. Jeremy Hylton Python / Google

Field Analysis. Last time Exploit encapsulation to improve memory system performance

HOT-Compilation: Garbage Collection

CSE 401 Final Exam. March 14, 2017 Happy π Day! (3/14) This exam is closed book, closed notes, closed electronics, closed neighbors, open mind,...

Java: framework overview and in-the-small features

Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology

Flow-sensitive rewritings and Inliner improvements for the Graal JIT compiler

PyPy - How to not write Virtual Machines for Dynamic Languages

Playing with bird guts. Jonathan Worthington YAPC::EU::2007

Sista: Improving Cog s JIT performance. Clément Béra

Compiler Construction: LLVMlite

G Programming Languages - Fall 2012

<Insert Picture Here> Symmetric multilanguage VM architecture: Running Java and JavaScript in Shared Environment on a Mobile Phone

Semantic Analysis. Lecture 9. February 7, 2018

Intermediate Code & Local Optimizations

CS 406/534 Compiler Construction Putting It All Together

Dynamic Dispatch and Duck Typing. L25: Modern Compiler Design

Compiler Construction Lent Term 2013 Lectures 9 & 10 (of 16)

COP4020 Programming Languages. Compilers and Interpreters Robert van Engelen & Chris Lacher

Self-Optimizing AST Interpreters

Lisp to Ruby to Rubinius

An Implementation of Python for Racket. Pedro Palma Ramos António Menezes Leitão

Intermediate Representations

Dynamic Languages Strike Back. Steve Yegge Stanford EE Dept Computer Systems Colloquium May 7, 2008

6.828: OS/Language Co-design. Adam Belay

MRI Internals. Koichi Sasada.

Tutorial: Building a backend in 24 hours. Anton Korobeynikov

The role of semantic analysis in a compiler

CS558 Programming Languages

Compiling Techniques

Java Internals. Frank Yellin Tim Lindholm JavaSoft

Skip the FFI! Embedding Clang for C Interoperability. Jordan Rose Compiler Engineer, Apple. John McCall Compiler Engineer, Apple

Running class Timing on Java HotSpot VM, 1

Project. there are a couple of 3 person teams. a new drop with new type checking is coming. regroup or see me or forever hold your peace

Chapter 1 GETTING STARTED. SYS-ED/ Computer Education Techniques, Inc.

CS558 Programming Languages

A Propagation Engine for GCC

G Programming Languages Spring 2010 Lecture 4. Robert Grimm, New York University

Adobe Flash Player ActionScript Virtual Machine (Tamarin)

Code Generation. Lecture 19

Administration CS 412/413. Why build a compiler? Compilers. Architectural independence. Source-to-source translator

Rui Wang, Assistant professor Dept. of Information and Communication Tongji University.

A Status Update of BEAMJIT, the Just-in-Time Compiling Abstract Machine. Frej Drejhammar and Lars Rasmusson

VA Smalltalk 9: Exploring the next-gen LLVM-based virtual machine

Reference Counting. Reference counting: a way to know whether a record has other users

FP Implementation. Simon Marlow PLMW 15

HSA foundation! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar Room A. Alario! 23 November, 2015!

CS 426 Fall Machine Problem 4. Machine Problem 4. CS 426 Compiler Construction Fall Semester 2017

Run-time Program Management. Hwansoo Han

The Compiler So Far. CSC 4181 Compiler Construction. Semantic Analysis. Beyond Syntax. Goals of a Semantic Analyzer.

Generating Code for Assignment Statements back to work. Comp 412 COMP 412 FALL Chapters 4, 6 & 7 in EaC2e. source code. IR IR target.

Run-time Environments. Lecture 13. Prof. Alex Aiken Original Slides (Modified by Prof. Vijay Ganesh) Lecture 13

JAVA PERFORMANCE. PR SW2 S18 Dr. Prähofer DI Leopoldseder

A Translation Framework for Automatic Translation of Annotated LLVM IR into OpenCL Kernel Function

Programming Assignment IV Due Thursday, June 1, 2017 at 11:59pm

CSE 501: Compiler Construction. Course outline. Goals for language implementation. Why study compilers? Models of compilation

Runtime. The optimized program is ready to run What sorts of facilities are available at runtime

Principles of Compiler Design

JamaicaVM Java for Embedded Realtime Systems

Eliminating Global Interpreter Locks in Ruby through Hardware Transactional Memory

DragonEgg - the Fortran compiler

LLVM for the future of Supercomputing

Transcription:

Accelerating Ruby with LLVM Evan Phoenix Oct 2, 2009

RUBY

RUBY Strongly, dynamically typed

RUBY Unified Model

RUBY Everything is an object

RUBY 3.class # => Fixnum

RUBY Every code context is equal

RUBY Every context is a method

RUBY Garbage Collected

RUBY A lot of syntax

RUBY Strongly, dynamically typed Unified model Everything is an object 3.class Every code context is equal Every context is a method Garbage collected A lot of syntax

Rubinius

Rubinius Started in 2006

Rubinius Build a ruby environment for fun

Rubinius Unlike most scripting languages, write as much in ruby as possible

Rubinius Core functionality of perl/python/ruby in C, NOT in their respective language.

Rubinius C => ruby => C => ruby

Rubinius Language boundaries suck

Rubinius Started in 2006 Built for fun Turtles all the way down

Evolution

Evolution 100% ruby prototype running on 1.8

Evolution Hand translated VM to C

Evolution Rewrote VM in C++

Evolution Switch away from stackless

Evolution Experimented with handwritten assembler for x86

Evolution Switch to LLVM for JIT

Evolution 100% ruby prototype Hand translated VM to C Rewrote VM in C++ Switch away from stackless Experiment with assembler Switch to LLVM for JIT

Features

Features Bytecode VM

Features Simple interface to native code

Features Accurate, generational garbage collector

Features Integrated FFI API

Features Bytecode VM Interface to native code Generational GC Integrated FFI

Benchmarks

def foo() ary = [] 100.times { i ary << i } end 300,000 times

Seconds 9 8.02 6.75 5.90 4.5 5.30 3.60 2.25 2.59 0 1.8 1.9 rbx rbx jit rbx jit +blocks

def foo() hsh = {} 100.times { i hsh[i] = 0 } end 100,000 times

Seconds 30 25.36 22.5 15 12.54 12.01 7.5 4.85 5.26 0 1.8 1.9 rbx rbx jit rbx jit +blocks

def foo() hsh = { 47 => true } 100.times { i hsh[i] } end 100,000 times

Seconds 7 6.26 5.25 3.5 3.64 2.68 2.66 1.75 2.09 0 1.8 1.9 rbx rbx jit rbx jit +blocks

Early LLVM Usage

Early LLVM Usage Compiled all methods up front

Early LLVM Usage Simple opcode-to-function translation with inlining

Early LLVM Usage Startup went from 0.3s to 80s

Early LLVM Usage Compiled all methods upfront Simple opcode-to-function translation Startup from 0.3s to 80s

True JIT

True JIT JIT Goals

True JIT JIT Goals Choose methods that benefit the most

True JIT JIT Goals Compiling has minimum impact on performance

True JIT JIT Goals Ability to make intelligent frontend decisions

Choosing Methods

Choosing Methods Simple call counters

Choosing Methods When counter trips, the fun starts

Choosing Methods Room for improvement

Choosing Methods Room for improvement Increment counters in loops

Choosing Methods Room for improvement Weigh different invocations differently

Choosing Methods Simple counters Trip the counters, do it Room for improvement Increment in loops Weigh invocations

Which Method?

Which Method? Leaf methods trip quickly

Which Methods? Leaf methods trip quickly Consider the whole callstack

Which Methods? Leaf methods trip quickly Pick a parent expecting inlining

Which Method? Leaf methods trip Consider the callstack Find a parent

Minimal Impact

Minimal Impact After the counters trip

Minimal Impact Queue the method

Minimal Impact Background thread drains queue

Minimal Impact Frontend, passes, codegen in background

Minimal Impact Install JIT d function

Minimal Impact Install JIT d function Requires GC interaction

Minimal Impact Trip the counters Queue the method Compile in background Install function pointer

Good Decisions

Good Decisions Naive translation yields fixed improvement

Good Decisions Performance shifts to method dispatch

Good Decisions Improve optimization horizon

Good Decisions Inline using type feedback

Good Decisions Naive translation sucks Inline using type feedback Performance in dispatch Improve optimizations

Type Feedback

Type Feedback Frontend translates to IR

Type Feedback Read InlineCache information

Type Feedback InlineCaches contain profiling info

Type Feedback Use profiling to drive inlining!

Type Feedback Frontend generates IR Reads InlineCaches InlineCaches have profiling Use profiling to drive inlining!

Inlining

Inlining Profiling info shows a dominant class

2 1% 1 class 98%

Inlining Lookup method in compiler

Inlining For native functions, emit direct call

Inlining For FFI, inline conversions and call

Inlining Find dominant class Lookup method Emit direct calls if possible

Inlining Ruby

Inlining Ruby Policy decides on inlining

Inlining Ruby Drive sub-frontend at call site

Inlining Ruby All inlining occurs in the frontend

Inlining Ruby Generated IR preserves runtime data

Inlining Ruby Generated IR preserves runtime data GC roots, backtraces, etc

Inlining Ruby No AST between bytecode and IR

Inlining Ruby No AST between bytecode and IR Fast, but limits the ability to generate better IR

Inlining Ruby Policy decides Drive sub-frontend Preserve runtime data Generates fast, ugly IR

LLVM

LLVM IR uses operand stack

LLVM IR uses operand stack Highlevel data flow not in SSA

LLVM IR uses operand stack Passes eliminate redundencies

LLVM IR uses operand stack Makes GC stack marking easy

LLVM IR uses operand stack nocapture improves propagation

LLVM Exceptions via sentinal value

LLVM Exceptions via sentinal value Nested handlers use branches for control

LLVM Exceptions via sentinal value Inlining exposes redundant checks

LLVM Inline guards

LLVM Inline guards Simple type guards

if(obj->class->class_id == <integer constant>) {

LLVM Inline guards Custom AA pass for guard elimination

LLVM Inline guards Teach pointstoconstantmemory about...

if(obj->class->class_id == <integer constant>) {

if(obj->class->class_id == <integer constant>) {

LLVM Maximizing constant propagation

LLVM Maximizing constant propagation Type failures shouldn t contribute values

if(obj->class->class_id == 0x33) { val = 0x7; } else { val = send_msg(state, obj,...); }

if(obj->class->class_id == 0x33) { val = 0x7; } else { return uncommon(state); }

LLVM Maximizing constant propagation Makes JIT similar to tracing

LLVM Use overflow intrinsics

LLVM Use overflow intrinsics Custom pass to fold constants arguments

LLVM AA knowledge for tagged pointers

LLVM AA knowledge of tagged pointers 0x5 is 2 as a tagged pointer

LLVM Not in SSA form Simplistic exceptions Inlining guards Maximize constants Use overflow Tagged pointer AA

Issues

Issues How to link with LLVM?

Issues How to link with LLVM? An important SCM issue

Issues Ugly, confusing IR from frontend

Issues instcombine confuses basicaa

Issues Operand stack confuses AA

Issues Inability to communicate semantics

Object* new_object(state)

Returned pointer aliases nothing Only modifies state If return value is unused, remove the call Semi-pure?

Issues Ugly IR Linking with LLVM AA confusion Highlevel semantics

Thanks! http://rubini.us ephoenix@engineyard.com