Learning R via Python...or the other way around

Similar documents
LECTURE 1. Getting Started with Python

SD314 Outils pour le Big Data

OOP and Scripting in Python

1 Decorators. 2 Descriptors. 3 Static Variables. 4 Anonymous Classes. Sandeep Sadanandan (TU, Munich) Python For Fine Programmers July 13, / 19

CSC 221: Introduction to Programming. Fall 2013

Inverse Ray Shooting Tutorial. Jorge Jiménez Vicente Dpto. Física Teórica y del Cosmos Universidad de Granada Spain

Coding Styles for Python

LECTURE 1. Getting Started with Python

Cosmology with python: Beginner to Advanced in one week. Tiago Batalha de Castro

Lecture 4: Strings and documentation

Extending Jython. with SIM, SPARQL and SQL

CIS192: Python Programming

Python: Its Past, Present, and Future in Meteorology

CS Programming Languages: Python

Python as a First Programming Language Justin Stevens Giselle Serate Davidson Academy of Nevada. March 6th, 2016

The current topic: Python. Announcements. Python. Python

Senthil Kumaran S

Introduction to Python

A polyglot day: learning from language paradigms. Benson Joeris Kathleen Dollard

Introduction to Python. Fang (Cherry) Liu Ph.D. Scien5fic Compu5ng Consultant PACE GATECH

Research Computing with Python, Lecture 1

Introduction to Python

Introduction to Python. Didzis Gosko

Webinar Series. Introduction To Python For Data Analysis March 19, With Interactive Brokers

COSC345 Software Engineering. Documentation

Lecture 1. Getting Started with Python

Be careful when deciding whether to represent data as integers or floats, and be sure that you consider all possible behaviors in computation.

Rapid Application Development with

The Pyth Language. Administrivia

2.Raspberry PI: Architecture & Hardware Specifications

Python for Earth Scientists

Accelerating Information Technology Innovation

Functions, Scope & Arguments. HORT Lecture 12 Instructor: Kranthi Varala

Course Title: Python + Django for Web Application

Introduction to Python: Data types. HORT Lecture 8 Instructor: Kranthi Varala

Data type built into Python. Dictionaries are sometimes found in other languages as associative memories or associative arrays.

Scientific Computing using Python

C++ for Python Programmers

GE PROBLEM SOVING AND PYTHON PROGRAMMING. Question Bank UNIT 1 - ALGORITHMIC PROBLEM SOLVING

Python Programming, bridging course 2011

4.2 Function definitions the basics

Weekly Discussion Sections & Readings

Programming with Python

Accelerating Information Technology Innovation

AMath 483/583 Lecture 7. Notes: Notes: Changes in uwhpsc repository. AMath 483/583 Lecture 7. Notes:

Shell / Python Tutorial. CS279 Autumn 2017 Rishi Bedi

AMath 483/583 Lecture 7

Lecture 16: Static Semantics Overview 1

Loops and Conditionals. HORT Lecture 11 Instructor: Kranthi Varala

Extended Introduction to Computer Science CS1001.py Lecture 10, part A: Interim Summary; Testing; Coding Style

CHAPTER 2: Introduction to Python COMPUTER PROGRAMMING SKILLS

LECTURE 2. Python Basics

Python at Glance. a really fast (but complete) ride into the Python hole. Paolo Bellagente - ES3 - DII - UniBS

CS 111X - Spring Final Exam - KEY

Python Workshop. January 18, Chaitanya Talnikar. Saket Choudhary

PYTHON FOR MEDICAL PHYSICISTS. Radiation Oncology Medical Physics Cancer Care Services, Royal Brisbane & Women s Hospital

This course is designed for anyone who needs to learn how to write programs in Python.

Scientific Computing

OOP and Scripting in Python Advanced Features

For your convenience Apress has placed some of the front matter material after the index. Please use the Bookmarks and Contents at a Glance links to

Functional Programming and Haskell

DISCRETE ELEMENT METHOD (DEM) SIMULATION USING OPEN-SOURCE CODES

List comprehensions (and other shortcuts) UW CSE 160 Spring 2015

Python Basics. Lecture and Lab 5 Day Course. Python Basics

CS 220: Introduction to Parallel Computing. Beginning C. Lecture 2

Decision Making in C

Introduction to Python

Python for Non-programmers

Getting Started. Office Hours. CSE 231, Rich Enbody. After class By appointment send an . Michigan State University CSE 231, Fall 2013

Python a modern scripting PL. Python

lambda forms map(), reduce(), filter(), eval(), and apply() estimating π with list comprehensions

Boolean Expressions and if 9/14/2007

Introduction to Typed Racket. The plan: Racket Crash Course Typed Racket and PL Racket Differences with the text Some PL Racket Examples

And Parallelism. Parallelism in Prolog. OR Parallelism

Introduction to Bioinformatics

Interactive use. $ python. >>> print 'Hello, world!' Hello, world! >>> 3 $ Ctrl-D

Control Structures. Lecture 4 COP 3014 Fall September 18, 2017

6 Initializing Abstract Models with Data Command Files Model Data The set Command Simple Sets... 68

Introduction to Python Part 2

PYTHON DATA SCIENCE TOOLBOX II. List comprehensions

Real Python: Python 3 Cheat Sheet

Introduction to Python - Part I CNV Lab

Python memento TI-Smart Grids

Intro to Programming. Unit 7. What is Programming? What is Programming? Intro to Programming

If Statements, For Loops, Functions

The SPL Programming Language Reference Manual

Speed up your Python code

IST311. Advanced Issues in OOP: Inheritance and Polymorphism

G Programming Languages - Fall 2012

Python Basics 본자료는다음의웹사이트를정리한내용이니참조바랍니다. PythonBasics

Interactive use. $ python. >>> print 'Hello, world!' Hello, world! >>> 3 $ Ctrl-D

ARTIFICIAL INTELLIGENCE AND PYTHON

Intro. Scheme Basics. scm> 5 5. scm>

Python review. 1 Python basics. References. CS 234 Naomi Nishimura

Chapter 7. Expressions and Assignment Statements ISBN

CIS192 Python Programming. Robert Rand. August 27, 2015

Advanced Programming Techniques. Python. Christopher Moretti

Software Development. Integrated Software Environment

UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division. P. N. Hilfinger

Flat triples approach to RDF graphs in JSON

Transcription:

Learning R via Python...or the other way around Dept. of Politics - NYU January 7, 2010

What We ll Cover Brief review of Python The Zen of Python How are R and Python the same, and how are they different Similar Data Structures Python dictionary R list Example - Testing if an integer is prime Create a function and returning a value Using while-loops Bridging the gap with RPy2 Running a regression with R inside Python Python resources

Fundamental Elements of Python The Python coder s creed >>> import this The Zen of Python, by Tim Peters Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex is better than complicated. Flat is better than nested. Sparse is better than dense. Readability counts. Special cases aren t special enough to break the rules. Although practicality beats purity. Errors should never pass silently. Unless explicitly silenced. In the face of ambiguity, refuse the temptation to guess. There should be one-- and preferably only one --obvious way to do it. Although that way may not be obvious at first unless you re Dutch. Now is better than never. Although never is often better than *right* now. If the implementation is hard to explain, it s a bad idea. If the implementation is easy to explain, it may be a good idea. Namespaces are one honking great idea -- let s do more of those!

How are R and Python alike, and how are they different? Similarities Differences

How are R and Python alike, and how are they different? Similarities Functional programming paradigm Differences Array of squared values # In Python we use a lambda nested in map >>> map(lambda x: x**2,range(1,11)) [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

How are R and Python alike, and how are they different? Similarities Functional programming paradigm Differences Array of squared values # In Python we use a lambda nested in map >>> map(lambda x: x**2,range(1,11)) [1, 4, 9, 16, 25, 36, 49, 64, 81, 100] # In R we define a function in sapply > sapply(1:10,function(x){return(x**2)}) [1] 1 4 9 16 25 36 49 64 81 100

How are R and Python alike, and how are they different? Similarities Functional programming paradigm Differences Array of squared values # In Python we use a lambda nested in map >>> map(lambda x: x**2,range(1,11)) [1, 4, 9, 16, 25, 36, 49, 64, 81, 100] # In R we define a function in sapply > sapply(1:10,function(x){return(x**2)}) [1] 1 4 9 16 25 36 49 64 81 100 Object-oriented programming Both languages support robust class structures In R, there are the S3 and S4 classes

How are R and Python alike, and how are they different? Similarities Functional programming paradigm Differences Array of squared values # In Python we use a lambda nested in map >>> map(lambda x: x**2,range(1,11)) [1, 4, 9, 16, 25, 36, 49, 64, 81, 100] # In R we define a function in sapply > sapply(1:10,function(x){return(x**2)}) [1] 1 4 9 16 25 36 49 64 81 100 Object-oriented programming Both languages support robust class structures In R, there are the S3 and S4 classes Easily call C/Fortran code Due to their high-level, both languages provide functionality to call compiled code from lower-level languages

How are R and Python alike, and how are they different? Similarities Functional programming paradigm Array of squared values # In Python we use a lambda nested in map >>> map(lambda x: x**2,range(1,11)) [1, 4, 9, 16, 25, 36, 49, 64, 81, 100] # In R we define a function in sapply > sapply(1:10,function(x){return(x**2)}) [1] 1 4 9 16 25 36 49 64 81 100 Differences General purpose OOP vs. Functional OOP Python is a high-level language for application development R is a collection of functions for data analysis Object-oriented programming Both languages support robust class structures In R, there are the S3 and S4 classes Easily call C/Fortran code Due to their high-level, both languages provide functionality to call compiled code from lower-level languages

How are R and Python alike, and how are they different? Similarities Functional programming paradigm Array of squared values # In Python we use a lambda nested in map >>> map(lambda x: x**2,range(1,11)) [1, 4, 9, 16, 25, 36, 49, 64, 81, 100] # In R we define a function in sapply > sapply(1:10,function(x){return(x**2)}) [1] 1 4 9 16 25 36 49 64 81 100 Object-oriented programming Both languages support robust class structures In R, there are the S3 and S4 classes Easily call C/Fortran code Due to their high-level, both languages provide functionality to call compiled code from lower-level languages Differences General purpose OOP vs. Functional OOP Python is a high-level language for application development R is a collection of functions for data analysis Syntax structure Python syntax forces readability through whitespace R emphasizes parsimony and nesting

How are R and Python alike, and how are they different? Similarities Functional programming paradigm Array of squared values # In Python we use a lambda nested in map >>> map(lambda x: x**2,range(1,11)) [1, 4, 9, 16, 25, 36, 49, 64, 81, 100] # In R we define a function in sapply > sapply(1:10,function(x){return(x**2)}) [1] 1 4 9 16 25 36 49 64 81 100 Object-oriented programming Both languages support robust class structures In R, there are the S3 and S4 classes Easily call C/Fortran code Due to their high-level, both languages provide functionality to call compiled code from lower-level languages Differences General purpose OOP vs. Functional OOP Python is a high-level language for application development R is a collection of functions for data analysis Syntax structure Python syntax forces readability through whitespace R emphasizes parsimony and nesting Creating a function # In Python functions are declared >>> def my func(x):...

How are R and Python alike, and how are they different? Similarities Functional programming paradigm Array of squared values # In Python we use a lambda nested in map >>> map(lambda x: x**2,range(1,11)) [1, 4, 9, 16, 25, 36, 49, 64, 81, 100] # In R we define a function in sapply > sapply(1:10,function(x){return(x**2)}) [1] 1 4 9 16 25 36 49 64 81 100 Object-oriented programming Both languages support robust class structures In R, there are the S3 and S4 classes Easily call C/Fortran code Due to their high-level, both languages provide functionality to call compiled code from lower-level languages Differences General purpose OOP vs. Functional OOP Python is a high-level language for application development R is a collection of functions for data analysis Syntax structure Python syntax forces readability through whitespace R emphasizes parsimony and nesting Creating a function # In Python functions are declared >>> def my func(x):... # In R functions are assigned > my.func<-function(x) {... }

How are R and Python alike, and how are they different? Similarities Functional programming paradigm Array of squared values # In Python we use a lambda nested in map >>> map(lambda x: x**2,range(1,11)) [1, 4, 9, 16, 25, 36, 49, 64, 81, 100] # In R we define a function in sapply > sapply(1:10,function(x){return(x**2)}) [1] 1 4 9 16 25 36 49 64 81 100 Object-oriented programming Both languages support robust class structures In R, there are the S3 and S4 classes Easily call C/Fortran code Due to their high-level, both languages provide functionality to call compiled code from lower-level languages For more info see: Differences General purpose OOP vs. Functional OOP Python is a high-level language for application development R is a collection of functions for data analysis Syntax structure Python syntax forces readability through whitespace R emphasizes parsimony and nesting Creating a function StackOverlow.com discussion on topic # In Python functions are declared >>> def my func(x):... # In R functions are assigned > my.func<-function(x) {... }

Python Dictionaries In Python the dict type is a data structure that represents a key value mapping

Python Dictionaries In Python the dict type is a data structure that represents a key value mapping Working with the dict type # Keys and values can be of any data type >>> fruit dict={"apple":1,"orange":[0.23,0.11],"banana":true }

Python Dictionaries In Python the dict type is a data structure that represents a key value mapping Working with the dict type # Keys and values can be of any data type >>> fruit dict={"apple":1,"orange":[0.23,0.11],"banana":true } # Can retrieve the keys and values as Python lists (vector) >>> fruit dict.keys() ["orange","apple","banana"]

Python Dictionaries In Python the dict type is a data structure that represents a key value mapping Working with the dict type # Keys and values can be of any data type >>> fruit dict={"apple":1,"orange":[0.23,0.11],"banana":true } # Can retrieve the keys and values as Python lists (vector) >>> fruit dict.keys() ["orange","apple","banana"] # Or create a (key,value) tuple >>> fruit dict.items() [("orange",[0.23,0.11]),("apple",1),("banana",true)]

Python Dictionaries In Python the dict type is a data structure that represents a key value mapping Working with the dict type # Keys and values can be of any data type >>> fruit dict={"apple":1,"orange":[0.23,0.11],"banana":true } # Can retrieve the keys and values as Python lists (vector) >>> fruit dict.keys() ["orange","apple","banana"] # Or create a (key,value) tuple >>> fruit dict.items() [("orange",[0.23,0.11]),("apple",1),("banana",true)] # This becomes especially useful when you master Python list comprehension

Python Dictionaries In Python the dict type is a data structure that represents a key value mapping Working with the dict type # Keys and values can be of any data type >>> fruit dict={"apple":1,"orange":[0.23,0.11],"banana":true } # Can retrieve the keys and values as Python lists (vector) >>> fruit dict.keys() ["orange","apple","banana"] # Or create a (key,value) tuple >>> fruit dict.items() [("orange",[0.23,0.11]),("apple",1),("banana",true)] # This becomes especially useful when you master Python list comprehension The Python dictionary is an extremely flexible and useful data structure, making it one of the primary advantages of Python over other languages Luckily, there is a fairly direct mapping between the Python dict and R lists!

R Lists Similarly to the Python dictionary, the R list is a key value mapping

R Lists Similarly to the Python dictionary, the R list is a key value mapping Working with an R list > fruit.list<-list("apple"=1,"orange"=c(0.23,0.11),"banana"=true) > fruit.list $apple [1] 1 $orange [1] 0.55 0.11 $banana [1] TRUE

R Lists Similarly to the Python dictionary, the R list is a key value mapping Working with an R list > fruit.list<-list("apple"=1,"orange"=c(0.23,0.11),"banana"=true) > fruit.list $apple [1] 1 $orange [1] 0.55 0.11 $banana [1] TRUE Notice, however, that R will always treat the value of the key/value pair as a vector; unlike Python, which does not care the value s data type. Of the many R programming paradigms, the vectorization of all data is among the strongest.

R Lists Similarly to the Python dictionary, the R list is a key value mapping Working with an R list > fruit.list<-list("apple"=1,"orange"=c(0.23,0.11),"banana"=true) > fruit.list $apple [1] 1 $orange [1] 0.55 0.11 $banana [1] TRUE Notice, however, that R will always treat the value of the key/value pair as a vector; unlike Python, which does not care the value s data type. Of the many R programming paradigms, the vectorization of all data is among the strongest. Using unlist > unlist(fruist.list) apple orange1 orange2 banana 1.00 0.55 0.11 1.00

Testing a Prime in Python We are interested in testing whether integer X is prime, and to do so we will: Declare a function Use a while-loop Return a boolean

Testing a Prime in Python We are interested in testing whether integer X is prime, and to do so we will: Declare a function Use a while-loop Return a boolean The is prime function def is prime(x): if x%2 == 0: return False else: y=3 while y<x: if x%y==0: return False else: y+=2 return True

Testing a Prime in Python We are interested in testing whether integer X is prime, and to do so we will: Declare a function Use a while-loop Return a boolean The is prime function def is prime(x): if x%2 == 0: return False else: y=3 while y<x: if x%y==0: return False else: y+=2 return True Notice the number of lines in this simple function. In contrast to R, and many other programming languages, Python s whitespace and indentation requirements force verbose code; however, the end result is very readable.

Testing a Prime in R Unlike Python where functions are declared explicitly, R creates functions through assignment. In many ways, this is somewhere between the Python lambda function and an explicit function declaration.

Testing a Prime in R Unlike Python where functions are declared explicitly, R creates functions through assignment. In many ways, this is somewhere between the Python lambda function and an explicit function declaration. The is.prime function is.prime<-function(x) { if(x%%2==0) {return(false) } else { y<-3 while(y<x) ifelse(x%%y==0,return(false),y<-y+2) } return(true) }

Testing a Prime in R Unlike Python where functions are declared explicitly, R creates functions through assignment. In many ways, this is somewhere between the Python lambda function and an explicit function declaration. The is.prime function is.prime<-function(x) { if(x%%2==0) {return(false) } else { y<-3 while(y<x) ifelse(x%%y==0,return(false),y<-y+2) } return(true) } Contrast the width of this version of the function with the length of the previous With the ifelse function R allows you to compress many lines of code A good example of the difference between a largely function programming language with objects (R) vs. a largely objected-oriented language with a core set of functions (Python)

Bridging the Gap with RPy2 Python is widely adopted within the scientific and data communities The combination of SciPy/NumPy/matplotlib represents a powerful set of tools for scientific computing Companies like Enthought are focused on servicing this area

Bridging the Gap with RPy2 Python is widely adopted within the scientific and data communities The combination of SciPy/NumPy/matplotlib represents a powerful set of tools for scientific computing Companies like Enthought are focused on servicing this area There are, however, many times when even this combination cannot match the analytical power of R enter RPy2 RPy2 is a Python package that allows you to call R directly from within a Python script This is particularly useful if you want to use one of R s base functions inside Python

Bridging the Gap with RPy2 Python is widely adopted within the scientific and data communities The combination of SciPy/NumPy/matplotlib represents a powerful set of tools for scientific computing Companies like Enthought are focused on servicing this area There are, however, many times when even this combination cannot match the analytical power of R enter RPy2 RPy2 is a Python package that allows you to call R directly from within a Python script This is particularly useful if you want to use one of R s base functions inside Python For example, creating a function to mimic R s lm in Python even using the above combination would be very time consuming. To avoid this, we will use RPy2 to call lm from within a Python script and plot the data.

Bridging the Gap with RPy2 Python is widely adopted within the scientific and data communities The combination of SciPy/NumPy/matplotlib represents a powerful set of tools for scientific computing Companies like Enthought are focused on servicing this area There are, however, many times when even this combination cannot match the analytical power of R enter RPy2 RPy2 is a Python package that allows you to call R directly from within a Python script This is particularly useful if you want to use one of R s base functions inside Python For example, creating a function to mimic R s lm in Python even using the above combination would be very time consuming. To avoid this, we will use RPy2 to call lm from within a Python script and plot the data. Disclaimer: the syntax can be a bit tricky to understand at first, so allow for a fairly steep learning curve!

Calling lm from Python with RPy2 Creating a linear model in Python

Calling lm from Python with RPy2 Creating a linear model in Python # First import RPy2 and create an instance of robjects import rpy2.robjects as robjects r = robjects.r

Calling lm from Python with RPy2 Creating a linear model in Python # First import RPy2 and create an instance of robjects import rpy2.robjects as robjects r = robjects.r # Create the data by passing a Python list to RPy2, which interprets as an R vector ctl = robjects.floatvector([4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14]) trt = robjects.floatvector([4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69]) group = r.gl(2, 10, 20, labels = ["Ctl","Trt"]) weight = ctl + trt

Calling lm from Python with RPy2 Creating a linear model in Python # First import RPy2 and create an instance of robjects import rpy2.robjects as robjects r = robjects.r # Create the data by passing a Python list to RPy2, which interprets as an R vector ctl = robjects.floatvector([4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14]) trt = robjects.floatvector([4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69]) group = r.gl(2, 10, 20, labels = ["Ctl","Trt"]) weight = ctl + trt # RPy2 uses Python dictionary types to index data robjects.globalenv["weight"] = weight robjects.globalenv["group"] = group

Calling lm from Python with RPy2 Creating a linear model in Python # First import RPy2 and create an instance of robjects import rpy2.robjects as robjects r = robjects.r # Create the data by passing a Python list to RPy2, which interprets as an R vector ctl = robjects.floatvector([4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14]) trt = robjects.floatvector([4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69]) group = r.gl(2, 10, 20, labels = ["Ctl","Trt"]) weight = ctl + trt # RPy2 uses Python dictionary types to index data robjects.globalenv["weight"] = weight robjects.globalenv["group"] = group # Run the models lm_d9 = r.lm("weight ~ group") print(r.anova(lm_d9)) lm_d90 = r.lm("weight ~ group - 1") print(r.summary(lm_d90))

Calling lm from Python with RPy2 Creating a linear model in Python # First import RPy2 and create an instance of robjects import rpy2.robjects as robjects r = robjects.r # Create the data by passing a Python list to RPy2, which interprets as an R vector ctl = robjects.floatvector([4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14]) trt = robjects.floatvector([4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69]) group = r.gl(2, 10, 20, labels = ["Ctl","Trt"]) weight = ctl + trt # RPy2 uses Python dictionary types to index data robjects.globalenv["weight"] = weight robjects.globalenv["group"] = group # Run the models lm_d9 = r.lm("weight ~ group") print(r.anova(lm_d9)) lm_d90 = r.lm("weight ~ group - 1") print(r.summary(lm_d90)) For more info check out the RPy2 documentation (from where this example was stolen!)

Python Resources Where to get Python Binaries for several operating systems and chipsets can be downloaded at http://www.python.org/download/ For OS X and Linux some version of Python comes pre-installed Some controversy over what version to use How to use Python Several very good development environments, once again SO has this covered My opinion, OS X: TextMate and Windows: Eclipse with PyDev, Linux: n/a How to Learn Python Several online resources: Dive into Python and Python Rocks For scientific computing: Enthought Webinars For applications to finance: NYC Financial Python User Group

Python Resources Where to get Python Binaries for several operating systems and chipsets can be downloaded at http://www.python.org/download/ For OS X and Linux some version of Python comes pre-installed Some controversy over what version to use How to use Python Several very good development environments, once again SO has this covered My opinion, OS X: TextMate and Windows: Eclipse with PyDev, Linux: n/a How to Learn Python Several online resources: Dive into Python and Python Rocks For scientific computing: Enthought Webinars For applications to finance: NYC Financial Python User Group The best way to learn any language is to have a problem and solve it using that language!