Beyond R? The Evolution of Statistical Computing Environments

Similar documents
Overview. Rationale Division of labour between script and C++ Choice of language(s) Interfacing to C++ Performance, memory

Overview. Rationale Division of labour between script and C++ Choice of language(s) Interfacing to C++

Outline. S: past, present and future Some thoughts. The 80s. Interfaces - 60s & 70s. Duncan Temple Lang Department of Statistics UC Davis

PyPy - How to not write Virtual Machines for Dynamic Languages

PERL 6 and PARROT An Overview. Presentation for NYSA The New York System Administrators Association

Semantic Analysis. Lecture 9. February 7, 2018

Building a Multi-Language Interpreter Engine

Lesson 1A - First Java Program HELLO WORLD With DEBUGGING examples. By John B. Owen All rights reserved 2011, revised 2015

Low level security. Andrew Ruef

How Rust views tradeoffs. Steve Klabnik

Parrot VM. Allison Randal Parrot Foundation & O'Reilly Media, Inc.

CSE 413 Languages & Implementation. Hal Perkins Winter 2019 Structs, Implementing Languages (credits: Dan Grossman, CSE 341)

SKILL AREA 304: Review Programming Language Concept. Computer Programming (YPG)

MySQL. The Right Database for GIS Sometimes

These are notes for the third lecture; if statements and loops.


6.001 Notes: Section 15.1

Algebraic Dynamic Programming. ADP Theory II: Algebraic Dynamic Programming and Bellman s Principle

9 th CA 2E/CA Plex Worldwide Developer Conference 1

Speech 2 Part 2 Transcript: The role of DB2 in Web 2.0 and in the IOD World

What is a programming language?

CPSC 427: Object-Oriented Programming

CS143 Final Fall 2009

COPYRIGHTED MATERIAL. An Introduction to Computers That Will Actually Help You in Life. Chapter 1. Memory: Not Exactly 0s and 1s. Memory Organization

Using GitHub to Share with SparkFun a

The name of our class will be Yo. Type that in where it says Class Name. Don t hit the OK button yet.

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler Front-End

What is a compiler? var a var b mov 3 a mov 4 r1 cmpi a r1 jge l_e mov 2 b jmp l_d l_e: mov 3 b l_d: ;done

Outline. Introduction Concepts and terminology The case for static typing. Implementing a static type system Basic typing relations Adding context

CS2900 Introductory Programming with Python and C++ Kevin Squire LtCol Joel Young Fall 2007

What is a compiler? Xiaokang Qiu Purdue University. August 21, 2017 ECE 573

The role of semantic analysis in a compiler

Artificial Intelligence Prof. Deepak Khemani Department of Computer Science and Engineering Indian Institute of Technology, Madras

Intro. Scheme Basics. scm> 5 5. scm>

VoIP INTERNET-BASED PHONE SYSTEMS CHOCK FULL OF FEATURES

MRI Internals. Koichi Sasada.

Computer Science 4U Unit 1. Programming Concepts and Skills Modular Design

If Statements, For Loops, Functions

Dynamic Languages Strike Back. Steve Yegge Stanford EE Dept Computer Systems Colloquium May 7, 2008

Orb-Weaver if the radiance of thousand suns were burst at once into the sky that might be the splendor of mighty one.

Programming in C++ Prof. Partha Pratim Das Department of Computer Science and Engineering Programming in C++ Indian Institute of Technology, Kharagpur

Advanced Operations Research Prof. G. Srinivasan Department of Management Studies Indian Institute of Technology, Madras

Dynamically-typed Languages. David Miller

Media-Ready Network Transcript

Introduction. A. Bellaachia Page: 1

CSCI-GA Scripting Languages

Excerpt from. Internet Basics. Jennie L. Phipps

1DL321: Kompilatorteknik I (Compiler Design 1) Introduction to Programming Language Design and to Compilation

9/21/17. Outline. Expression Evaluation and Control Flow. Arithmetic Expressions. Operators. Operators. Notation & Placement

PATH FINDING AND GRAPH TRAVERSAL

Erlang and Node.JS or, better, Criteria for evaluating technology. Lev Walkin, CTO

Type Inference. Prof. Clarkson Fall Today s music: Cool, Calm, and Collected by The Rolling Stones

CS240: Programming in C

CS558 Programming Languages

Project 3 Due October 21, 2015, 11:59:59pm

The Very Basics of the R Interpreter

CIO 24/7 Podcast: Tapping into Accenture s rich content with a new search capability

MoarVM. A metamodel-focused runtime for NQP and Rakudo. Jonathan Worthington

As a programmer, you know how easy it can be to get lost in the details

Software metrics for open source systems. Niklas Kurkisuo, Emil Sommarstöm, 25697

Problem Solving through Programming In C Prof. Anupam Basu Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur

Week 2: The Clojure Language. Background Basic structure A few of the most useful facilities. A modernized Lisp. An insider's opinion

Week - 01 Lecture - 04 Downloading and installing Python

MEAP Edition Manning Early Access Program WebAssembly in Action Version 1

A Report on RMI and RPC Submitted by Sudharshan Reddy B

Compilers Project Proposals

Bases de Dades: introduction and organization

The x86 Architecture. ICS312 - Spring 2018 Machine-Level and Systems Programming. Henri Casanova

An aside. Lecture 14: Last time

Australian Informatics Olympiad Thursday 23 August, Information Booklet

CPSC 427: Object-Oriented Programming

Some possible directions for the R engine

CS1 Lecture 3 Jan. 18, 2019

Introduction to.net Framework

The Future of Programming Languages. Will Crichton CS /28/18

MITOCW watch?v=kz7jjltq9r4

Pondering the Problem of Programmers Productivity

Python for Analytics. Python Fundamentals RSI Chapters 1 and 2

Virtual Execution Environments: Opportunities and Challenges

CS1 Lecture 3 Jan. 22, 2018

Programming Languages, Summary CSC419; Odelia Schwartz

Topic 1: Introduction

1DL321: Kompilatorteknik I (Compiler Design 1)

(Refer Slide Time 01:41 min)

Introduction to.net Framework Week 1. Tahir Nawaz

! Broaden your language horizons! Different programming languages! Different language features and tradeoffs. ! Study how languages are implemented

4. Java Project Design, Input Methods

Administration Computers Software Algorithms Programming Languages

Student Success Guide

What do Compilers Produce?

COMP 102: Computers and Computing

Expressions and Data Types CSC 121 Spring 2015 Howard Rosenthal

Playing with bird guts. Jonathan Worthington YAPC::EU::2007

Assigns a number to 110,000 letters/glyphs U+0041 is an A U+0062 is an a. U+00A9 is a copyright symbol U+0F03 is an

Mention driver developers in the room. Because of time this will be fairly high level, feel free to come talk to us afterwards

CHAPTER 16 - VIRTUAL MACHINES

Windows Script Host Fundamentals

Lecture content. Course goals. Course Introduction. TDDA69 Data and Program Structure Introduction

The Environment Model. Nate Foster Spring 2018

Excel Basics Rice Digital Media Commons Guide Written for Microsoft Excel 2010 Windows Edition by Eric Miller

Transcription:

Beyond R? The Evolution of Statistical Computing Environments Jay is Associate Professor of Statistics, Yale University Mike submitted his dissertation on August 31, 2010, and is now Research Faculty, Yale Center for Analytical Science This informal Chicago Meetup presentation previews an Invited Session for JSM 2011 in Miami organized by. We apologize in advance for any errors and omissions.

Outline But what s wrong with R? 1 But what s wrong with R? 2

Duncan starts the discussion... Quoted from his web page, http://www.stat.ucdavis.edu/~duncan/. The goal is to think about where statistical computing should be in the future, e.g. 5 years from now, and to determine how to get practitioners there.

Ross: what s wrong with R? Quotes from his JSM 2010 talk, Lessons Learned, Directions for the Future. Because R now has a large number of users who require a stable platform for getting work done, it is no longer suitable as a base for experimentation and development. R is not very good at handling large-scale problems. The following present particular difficulties. Execution of large amounts of R code. Scalar (element-by-element) computation. Computations on large volumes of data. Some computational problems involve a mix of all of these.

Luke: moving forward... From his UseR! 2010 keynote.... a few possible directions for development in the core R engine over the next 12 to 18 months: Taking advantage of multiple cores for vectorized operations and simple matrix operations. Byte code compilation of R code. Further developments in error handling. Increasing the limit on the size of vector data objects.

What can we do? These points were from Ross 2010 JSM address, Lessons Learned, Directions for the Future. Wait for faster machines. Introduce more vectorisation and take advantage of multicores. Make changes to R to eliminate bottlenecks. Compilation. Use non-copying semantics. Sweep the page clean and look at designs for new languages. Duncan Temple Lang, Brendan McArdle and I have begun examining what such new languages might look like. Ross doesn t like this suggestion, of which Robert seems to be a proponent. We agree with Ross, and note that faster machines aren t really the issue: the issue is having a language that takes full advantage (natively) of current hardware capabilities. R doesn t.

So: what next?

Simon: moving forward with R? From personal communication with Simon: For those interested I have created a branch of R in which I am experimenting with the ideas of Aleph that have to do with the language. This allows us to experiment with the language additions (argument types, function vectors matching, fundamental class type etc.) directly in existing R. This means we can easily test existing code and actually try to use the new features for real work (as opposed to wait for Aleph that needs far more work). https://svn.r-project.org/r/branches/r-exp-r5 I have just branched it today and the only addition are optional function types so you can define functions like function(real x, integer n = 1L,...) {... } but there is a lot more coming.

Luke: a virtual machine for R? From his web pages and his 2010 UseR! address: http://www.cs.uiowa.edu/~luke/r/bytecode.html http://www.cs.uiowa.edu/~luke/r/compiler/ From his README: Snapshot of byte code compiler for R. The current version requires R 2.12.0 or later. Install the compiler package, and look at the help for cmpfun. Virtual machines, and their machine code, are usually specific to the languages they are designed to support. Remember this last point, it will be relevant later, and isn t always correct. We also note that Luke is also working on other changes to R noted in an earlier slide and not referenced here.

Mike, Jay, Bryan,... extending R and more From drunken conversations over the past few years... Memory 8-byte integer indexing Column-major matrices for scalable linear algebra Shared memory for efficient concurrent programming Memory-mapped files (Bryan pushed on this point) Current successes of the Bigmemory Project (http://www.bigmemory.org): 800 GB test matrix on a laptop with a USB drive 160 GB truncated singular value decomposition on the Netflix data (results at end time permitting) Thread safety (not our focus, but not unrelated to the memory discussion)

Simon: Aleph? From Simon s Aleph Wiki: Aleph is an open-source project to create the next generation of statistical computing software, possibly as a successor to R. The goal is to provide a modern, flexible system suitable for statistical analysis. All aspects of the project are currently experimental and up for discussion. The current experimental implementation is written in C and features its own C-level object system. http://www.rforge.net/mediawiki/index.php/aleph http://rforge.net/aleph/

Andrew: CXXR? From http://www.cs.kent.ac.uk/projects/cxxr/ The aim of the CXXR project is gradually to refactor (reengineer) the interpreter of the R language, currently written for the most part in C, into C++, whilst as far as possible retaining full functionality. CXXR is being carried out independently of the main R development and maintenance effort. It is hoped that by reorganising the code along object-oriented lines, by deploying the tighter code encapsulation that is possible in C++, and by improving the internal documentation, the project will make it easier for researchers to develop experimental versions of the R interpreter.

Ross/Duncan: Lisp? This project proposes a completely new language! From Back to the Future: Lisp as a Base for a Statistical Computing System. We don t have the resources to build a new system entirely from scratch. We need some giant shoulders to stand on. On the plus side: Lisp is a well-established, widely-used system There are a multiplicity of high-quality implementations There are very good resources explaining Lisp at both the high and low levels. On the minus side: Lisp has an image problem it is perceived as a dead language. Because Lisp is an amalgam of features there are some inelegances to deal with.

Is that it? A preview of a JSM session? No, there s more. Give the speaker a beer and brace yourselves...

Past But what s wrong with R? First there was S

Present But what s wrong with R? And then there was R

Future? But what s wrong with R? Only one thing makes sense...

Q? But what s wrong with R?

Q? But what s wrong with R?

: Q? $./cleanandbuild.pl Cleaning and building Q for Parrot VM. Copyright (C) 2010, John W. Emerson, Michael J. Kane. Checking src/ for *.pir...found some, ok Removing *.pir in src...ok Building Q, please wait...ok Testing Q, please wait...ok, tests passed. You may now run interactive Q. $

: Q? $./Q Q for Parrot VM version 0.0.1 (2010-10-19) Copyright (C) 2010, John W. Emerson, Michael J. Kane. Q is a case study and comes with ABSOLUTELY NO WARRANTY. You are not welcome to redistribute it under any circumstance. See LICENSE for details. Natural language support? Probably not. Q is a collaborative project with two contributors. Type 'help.start()' to get started and 'q()' to quit Q. >

: Q? > a <- 1 > if (a!=1) print("yikes!") else { print("a is 1") } a is 1 > myfun <- function(x) { return(x+1) } > b <- myfun(a) > print(b) 2 > set_seed(1,2) Seed has been set. > rexp(1) Your exponential: 0.128221109772152 > rexp(1) Your exponential: 0.866102216996375

: Q? Based on the Parrot virtual machine (from http://www.parrot.org): Parrot is a virtual machine designed to efficiently compile and execute bytecode for dynamic languages. Parrot currently hosts a variety of language implementations in various stages of completion, including Tcl, Javascript, Ruby, Lua, Scheme, PHP, Python, Perl 6, APL, and a.net bytecode translator. Parrot is designed to provide interoperability between languages that compile to it. Uses the Artistic License 2.0

: Q? Based on the Parrot virtual machine (from http://www.parrot.org): Parrot exposes a rich interface for high-level languages to use, including several important features: a robust exceptions system compilation into platform-independent bytecode a clean extension and embedding interface a just-in-time compilation to machine code native library interface mechanisms garbage collection support for objects and classes a robust concurrency model Designing a new language or implementing a new compiler for an old language is easier with all of these features designed, implemented, tested, and supported in a VM already. The only tasks required of compiler implementers who target the Parrot platform is the creation of the parser and the language runtime.

Why not Java VM or.net? They don t automatically provide multiple dispatch (in the S4 sense) introspection (as in an object being associated with its methods) whereas Parrot standardizes them for all languages under the umbrella of the Parrot VM. In Parrot, writing a new language is a matter of mapping objects defined in the grammar to the existing facilities of the virtual machine. This is critical for the interoperability between languages.

: Q? Where are we now? The first 3/4 of this talk was completely serious, and previews an exciting Invited Session for JSM 2011 in Miami. The last 1/4 was intended partly in jest: R is here to stay this is for fun. Q exists (barely) as an experimental project at the moment. Best case? Q may become an open-source experimental project seeking collaborators in the near future. Or it may die quietly. Unless a new language/environment can provide almost complete backwards-compatibility with R, we don t think it will take off. And that level of compatibility seems unlikely. Parrot is still evolving. An original basic Q parser working under Parrot 2.4.0 ceased working under 2.6.0; 2.8.0 was just released last month.

Is this for real? Not really. But... Code will go up somewhere by the end of November (we hope). There is no "Q Core" at this point, but collaborators are welcome. If there is ever any sense of inertia, at that point we ll start thinking about license issues and formal project organization, etc... Could this be a candidate for GSOC?

Q parser grammar example rule block { {*} #= open <maybenewline> '{' <maybenewline> <statement>* <maybenewline> '}' <maybenewline> {*} #= close <statement>? {*} #= close } rule if_statement { 'if' '(' <expression> ')' <block> ['else' <else=block>]? {*} }

Q action example (nqp: not quite Perl) method if_statement($/) { my $cond := $<expression>.ast; my $then := $<block>.ast; my $past := PAST::Op.new( $cond, $then, :pasttype('if'), :node($/) ); } ## if there's an else clause, ## add it to the PAST node. if $<else> { $past.push( $<else>[0].ast ); } make $past;

Netflix: a truncated singular value decomposition ST ST ST ST AP AP AP SW: I SW: II LOTR(ex) SW SW SW LOTR HP LOTR HP HP The Horror Within Horror Hospital The The West West Wing: Wing: Season Season 1 24 3 Waterworld The Shawshank Redemption Rain Man Braveheart Mission: Impossible Men in Black Men in Black II Titanic The Rock The Green Mile Independence Day