Python, SageMath/Cloud, R and Open-Source Harald Schilly 2016-10-14 TANCS Workshop Institute of Physics University Graz
The big picture
The Big Picture Software up to the end of 1979: Fortran: LINPACK (later LAPACK), BLAS, etc. Macsyma (later Maxima): symbolic computing S Programming language (later R) Open-source until 2000: R: emerges as a serious statistics and data analysis platform Maxima: open-source computer algebra system Python: invented early 90s, based on ABC, very user-friendly Mid-2000 until now: Python: growing usage in scientific computing, data analysis, machine learning, etc. SageMath: Python-based environment for mathematical computing R: de-facto standard for scientific publications in statistics... and many more emerging tools and libraries like Julia 1
Shifting Paradigm: Open by Default Open-source and Open-access Scientific publications, databases, programming languages, libraries, file formats, etc.: all are shifting towards being open and accessible. 2
Shifting Paradigm: Open by Default Open-source and Open-access Scientific publications, databases, programming languages, libraries, file formats, etc.: all are shifting towards being open and accessible. Networked Computing The personal computing area brought rise to packaged software for users. This model already shifted towards Software as a Service (SaaS). 2
Shifting Paradigm: Open by Default Open-source and Open-access Scientific publications, databases, programming languages, libraries, file formats, etc.: all are shifting towards being open and accessible. Networked Computing The personal computing area brought rise to packaged software for users. This model already shifted towards Software as a Service (SaaS). Collaboration Software development happens publicly and worldwide (e.g. GitHub). Research collaboration has no borders. Proprietary software locks up users in walled gardens. Open Data initiatives: Zenodo, OpenAIRE,... Reproducible Research. 2
SageMath
SageMath http://sagemath.org/ Quick Survey Who has ever heard of SageMath before? Who has used SageMath? Who has contributed to SageMath? 3
History 2004: William Stein started SageMath at Harvard. Motivation: Frustrated with closed-source mathematics software and in particular with Magma. 2005: First version of Sage ever. 2006: After lots of hard work, small team at University of Washington formed around it. 4
Motivation and Goals Motivation Frustration with the state of mathematical software: only commercial players and fragmented academic software. Goals Some of the general goals behind SageMath: Unify fragmented academic mathematical software. Easier installation/distribution of the software. Use type system to express mathematical knowledge. Allow for mixing instances of such types in calculations ( coercion ), e.g., multiplying a matrix over Z with an element in F 2. Foster a mathematical research platform. 5
Solutions Solutions Uses a common widely used programming language and use types to express mathematical objects in code. Package many open-source tools in a consistent manner. Stands on the shoulders of giants: uses existing software packages like Pari/GP, Python, Matplotlib, R, SymPy, Maxima, etc. In total, about 100 software packages. The core library uses these tools and implements its own algorithms; An extensive test suite ensures that the whole collection of functionality works well together. 6
Solutions Solutions Uses a common widely used programming language and use types to express mathematical objects in code. Package many open-source tools in a consistent manner. Stands on the shoulders of giants: uses existing software packages like Pari/GP, Python, Matplotlib, R, SymPy, Maxima, etc. In total, about 100 software packages. The core library uses these tools and implements its own algorithms; An extensive test suite ensures that the whole collection of functionality works well together. Bold Mission Statement Create a viable free open source alternative to Magma, Maple, Mathematica and Matlab. 6
Python: core engine behind SageMath Benefits of Python Easy to learn and teach: many ideas originate from the ABC language. Powerful and universal: mathematical objects are instances of types in Python. Widely used and supported by the industry: Google, Microsoft, etc. Spillover effect: learning SageMath means also learning Python. Since mid-2000s, thriving ecosystem in engineering, numerical mathematics, big data and machine learning. Many other Python libraries can be accessed from within SageMath. 7
Example: Mathematical Types in Knot Theory First, define a Knot by its oriented Gauss code. K = Link([[[-1, 2, -4, 5], [1, -3, 4, -6], [-2, 3, -5, 6]], [-1, 1, -1, 1, -1, 1]]) Orientation: K.orientation() = [ 1, 1, 1, 1, 1, 1] Number of components: K.number of components() = 3 Alexander Polynomial: K.alexander polynomial() = 1 t 2 + 4 t + 6 4t + t 2 8
Python: numerical mathematics and data analysis Since mid-2000s, several driving forces behind Python established a solid basis for numerical mathematics: NumPy n-dimensional array library (tensor arithmetic) bindings for Fortran/C/C++ (same data-structure, uses existing libraries) Scipy and other libraries make use of it 9
Python: numerical mathematics and data analysis Since mid-2000s, several driving forces behind Python established a solid basis for numerical mathematics: NumPy n-dimensional array library (tensor arithmetic) bindings for Fortran/C/C++ (same data-structure, uses existing libraries) Scipy and other libraries make use of it Example non-profit: NumFOCUS sponsoring PyData, Pandas, Jupyter, PyTables, Julia, Matplotlib, AstroPy, FeniCS,... 9
Python: numerical mathematics and data analysis Since mid-2000s, several driving forces behind Python established a solid basis for numerical mathematics: NumPy n-dimensional array library (tensor arithmetic) bindings for Fortran/C/C++ (same data-structure, uses existing libraries) Scipy and other libraries make use of it Example non-profit: NumFOCUS sponsoring PyData, Pandas, Jupyter, PyTables, Julia, Matplotlib, AstroPy, FeniCS,... Example for-profit: Google Python/PSF, GSoC, Tensorflow,... Continuum.io Conda/Anaconda, Bokeh, Numba, Dask, Blaze,... 9
R: open-source statistical software http://www.r-project.org Based on the S -language (domain specific, from the 1970s) Similar project like SageMath, but for statistics. Started in the first half of the 1990s, 1.0 release in 2000. Invented DataFrames : expressive and powerful manipulation of typed columnar data. (Idea lives on in Python s Pandas library, Apache Spark, Julia, etc.) R Packages are an ecosystem for experimentation and innovation (almost 10,000)! 10
R: Packages R is famous for plotting: e.g. ggplot2: implements the Grammar of Graphics p <- ggplot(mtcars, aes(factor(cyl), mpg)) p + geom_violin(draw_quantiles = c(0.25, 0.5, 0.75)) Bioconductor: analyzing genomic data shiny: interactive websites as a report many more: dplyr, tidyr, stringr, zoo (time series), quantmod (finance), maptools (spatial data), etc. CRAN Task Views: https://cran.r-project.org/web/views/ 11
SageMathCloud
SageMathCloud http://cloud.sagemath.com/ Second Quick Survey Who has ever heard of SageMathCloud before? Who has an account on SageMathCloud? Who has ever had trouble running SageMath or some other scientific open-source software locally on your computer? 12
Solution for a changing world Problem: Although SageMath has a wonderful user-base, it stopped growing past about 50K active users. Key factors: install is difficult since SageMath is a large package, requires non-windows OS or a VM, management of own system and files, etc. 13
Solution for a changing world Problem: Although SageMath has a wonderful user-base, it stopped growing past about 50K active users. Key factors: install is difficult since SageMath is a large package, requires non-windows OS or a VM, management of own system and files, etc. Solution: Create an online SaaS platform with these benefits: Zero-setup: all software and servers are updated and maintained for you. Access your project from anywhere via the internet. Collaboration: real-time synchronized computational documents (SageWS and Jupyter) in shared projects, communicate via chat, task lists,... Backup and Snapshots: never lose your work again! Teach a class: all students immediately ready to go, manage and grade assignments, help student s directly,... Author Markdown and L A TEX documents directly where your research happens; and Publish your work online. 13
SageMathCloud Project The cornerstones of the SageMathCloud project: Fully open-source distributed online application; Leverages modern web-standards, cloud computing and service orchestration; Provides SageMath, R, Python, Jupyter, Julia, Anaconda, L A TEX, Octave, and many more software packages through its novel UI; Is backed by SageMath, Inc., a company founded by William Stein in 2015; Goals of SageMath, Inc. align with SageMath in terms of making open-source software more accessible, removing friction of using it, and to enhance its development. 14
Example: SageTeX
L A TEX with embedded calculations SageTeX is a L A TEX package for running SageMath computations right inside a document. Results are even cached between runs! Examples Inline commands: \sage{factor(2016)} = 2 5 3 2 7 \sage{integrate(x^2*sin(x), x)}: x 2 sin(x) dx = ( x 2 2 ) cos (x) + 2 x sin (x) Define a graph G 4 : \begin{sageblock} G4 = DiGraph({1:[2,2,3,5], \ 2:[3,4], 3:[4], \ 4:[5,7], 5:[6]}, \ multiedges=true) G4plot = G4.plot(layout= circular ) \end{sageblock} Plot via \sageplot{g4plot}: 3 2 1 4 5 7 6 15
DEMO SageMath and SageMathCloud Demo 16
Thank You! Harald Schilly harald@schil.ly c 2016 17