[%]%async_run. an IPython notebook* magic for asynchronous (code) cell execution. Valerio Maggio Researcher

Similar documents
LECTURE 22. Numerical and Scientific Computing Part 2

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer

Data Formats. for Data Science. Valerio Maggio Data Scientist and Researcher Fondazione Bruno Kessler (FBK) Trento, Italy.

JUPYTER (IPYTHON) NOTEBOOK CHEATSHEET

UI and Python Interface

Notebooks for documenting work-flows

DATA FORMATS FOR DATA SCIENCE Remastered

Jupyter and Spark on Mesos: Best Practices. June 21 st, 2017

1. BASICS OF PYTHON. JHU Physics & Astronomy Python Workshop Lecturer: Mubdi Rahman

AUTHORS: FERNANDO PEREZ BRIAN E GRANGER (IEEE 2007) PRESENTED BY: RASHMISNATA ACHARYYA

Scientific computing platforms at PGI / JCNS

Enhanced Model Deployment in GAMS

The Python interpreter

SAS and Python: The Perfect Partners in Crime

Jupyter and TMVA. Attila Bagoly (Eötvös Loránd University, Hungary) Mentors: Sergei V. Gleyzer Enric Tejedor Saavedra

Getting started with the Spyder IDE

OpenDreamKit. Computational environments for research and education Min Ragan-Kelley. Simula Research Lab

QUICK EXCEL TUTORIAL. The Very Basics

Scientific Python. 1 of 10 23/11/ :00

Python ecosystem for scientific computing with ABINIT: challenges and opportunities. M. Giantomassi and the AbiPy group

Using jupyter notebooks on Blue Waters. Roland Haas (NCSA / University of Illinois)

Final Project Writeup

Lecture 3: Processing Language Data, Git/GitHub. LING 1340/2340: Data Science for Linguists Na-Rae Han

As I begin to write this, I m returning on the plane from PyCon 2013,

Certified Data Science with Python Professional VS-1442

KNIME Python Integration Installation Guide. KNIME AG, Zurich, Switzerland Version 3.7 (last updated on )

Python for Data Analysis

CDIS Biomedical Data Commons

Introduction to Computer Vision Laboratories

This document (including, without limitation, any product roadmap or statement of direction data) illustrates the planned testing, release and

What is Standard APEX? TOOLBOX FLAT DESIGN CARTOON PEOPLE

(Ca...

COMP 364: Computer Tools for Life Sciences

Index. Bessel function, 51 Big data, 1. Cloud-based version-control system, 226 Containerization, 30 application, 32 virtualize processes, 30 31

callgraph Documentation

Introduction to ufit

Facilitating Collaborative Analysis in SWAN

Immutable Server Generation. The New App Deployment. AXEL

12/22/11. Java How to Program, 9/e. Help you get started with Eclipse and NetBeans integrated development environments.

Igniting QuantLib on a Zeppelin

Kernel Gateway Documentation

Teraflops of Jupyter: A Notebook Based Analysis Portal at BNL

Using Python and XMPP to build a decentralized social network

OREKIT IN PYTHON ACCESS THE PYTHON SCIENTIFIC ECOSYSTEM. Petrus Hyvönen

Command Line and Python Introduction. Jennifer Helsby, Eric Potash Computation for Public Policy Lecture 2: January 7, 2016

Matplotlib Python Plotting

Spyder Documentation. Release 3. Pierre Raybaut

Practical Statistics for Particle Physics Analyses: Introduction to Computing Examples

Hello everyone. My name is Kundan Singh and today I will describe a project we did at Avaya Labs.

WHITEPAPER. Pipelining Machine Learning Models Together

Lab 5 - Repetition. September 26, 2018

Once you define a new command, you can try it out by entering the command in IDLE:

Advanced Database Project: Document Stores and MongoDB

INTERACTIVE PARALLEL COMPUTING

How to Setup an Auto-Subscription (Auto- ) for Your Clients in FlexMLS

Under the Debug menu, there are two menu items for executing your code: the Start (F5) option and the

CSE 101 Introduction to Computers Development / Tutorial / Lab Environment Setup

IPython: a very quick overview

EDIT202 Spreadsheet Lab Prep Sheet

tutorial : modeling synaptic plasticity

Geant4 python interface

Making Dynamic Instrumentation Great Again

Text Input and Conditionals

Automate to Innovate L EA RN WHAT SCRIPTING CAN DO FOR YOU P U N E E T S I N G H

Big Data Exercises. Fall 2016 Week 0 ETH Zurich

IPython Cypher Documentation

Parallel Python using the Multiprocess(ing) Package

Outline. MXCuBE3 at ESRF. Remote Access. Quick Review of 3.0. New in version Future work. Marcus Oskarsson

Introduction to Scientific Typesetting Lesson 1: Getting Started

Installation Manual and Quickstart Guide

A Basic Introduction to SASPy and Jupyter Notebooks

David J. Pine. Introduction to Python for Science & Engineering

DSCI 325: Handout 1 Introduction to SAS Programs Spring 2017

CIS192 Python Programming

Django Test Utils Documentation

TANGO CONTROLS CONCEPTS

4. BASIC PLOTTING. JHU Physics & Astronomy Python Workshop Lecturer: Mubdi Rahman

Professor Hugh C. Lauer CS-1004 Introduction to Programming for Non-Majors

Building the Modern Research Data Portal using the Globus Platform. Rachana Ananthakrishnan GlobusWorld 2017

Introduction to Computer Programming for Non-Majors

Sizing and Memory Settings for Sage X3 Web Server (Syracuse): Recommendations

Installation Manual and Quickstart Guide

Computer Fundamentals: Operating Systems, Concurrency. Dr Robert Harle

Pandas plotting capabilities

DSC 201: Data Analysis & Visualization

1. Welcome. (1) Hello. My name is Dr. Christopher Raridan (Dr. R). (3) In this tutorial I will introduce you to the amsart documentclass.

Cloud Computing with APL. Morten Kromberg, CXO, Dyalog

COSC 490 Computational Topology

Introduction to the workbook and spreadsheet

Semester 2, 2018: Lab 1

The Essence of Node JavaScript on the Server Asynchronous Programming Module-driven Development Small Core, Vibrant Ecosystem The Frontend Backend

Blurring the Line Between Developer and Data Scientist

L15. 1 Lecture 15: Data Visualization. July 10, Overview and Objectives. 1.2 Part 1: Introduction to matplotlib

ME30_Lab1_18JUL18. August 29, ME 30 Lab 1 - Introduction to Anaconda, JupyterLab, and Python

Session 1: Introduction to Python from the Matlab perspective. October 9th, 2017 Sandra Diaz

Writing Cognitive Swift Apps developerworks Open Tech Talk March 8, 2017

Data Science with Python Course Catalog

Malware

OCS for the Blackboard Learn Platform

CSC 261/461 Database Systems Lecture 24. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

Transcription:

[%]%async_run an IPython notebook* magic for asynchronous (code) cell execution Valerio Maggio Researcher valeriomaggio@gmail.com @leriomaggio

Premises

Jupyter Notebook

Jupyter Notebook

Jupyter Notebook Example Multiple Kernels

Currently in Use at

Jupyter Architecture Notebook Document Format Jupyter Notebooks are an open document format based on JSON. They contain a complete record of the user's sessions and embed code, narrative text, equations and rich output. Interactive Computing Protocol The Notebook communicates with computational Kernels using the Interactive Computing Protocol, an open network protocol based on JSON data over ZMQ and WebSockets. The Kernel Kernels are processes that run interactive code in a particular programming language and return output to the user. Kernels also respond to tab completion and introspection requests.

>50 Kernels github.com/ipython/ipython/wiki/ipython-kernels-for-other-languages

Reproducible Research

Motivations Sometimes it may be required to apply for heavy computations computationally intensive code cells Moreover, sometimes may be required that this computation is actually executed on a remote server machine reminder: Jupyter Notebook Server In the general case, this could work but since

Anything that can possibly go wrong, does. Murphy s Law, 1952

Main Goal Try to define a strategy to cope with this kind of situation keeping the following requirements in mind: Allow the execution on a remote machine (also) Avoid the client machine to busy waiting Keep the interactivity of the notebook as much as possible

[%]%async_run an IPython notebook* magic for asynchronous (code) cell execution What I learned during my adventures in the world of Jupyter, Multiprocessing and Asynchronous I/O

Jupyter Ecosystem

[%]%async_run an IPython notebook* magic for asynchronous (code) cell execution What I learned during my adventures in the world of Jupyter, Multiprocessing and Asynchronous I/O

IPython Magics (since IPython 3.x) IPython has a system of commands we call magics provide effectively a mini command language that is orthogonal to the syntax of Python easily extensible by the user with new commands. Magics are meant to be typed interactively i.e. command-line conventions e.g. whitespace for separating arguments, dashes for options. Magics come in two kinds: Line magics: prepended by one % character Cell magics: two percent characters as a marker (%%)

[%]%timeit Line Magic Cell Magic

Activate matplotlib inline-backend to have charts displayed inline with notebook cells

Custom Magics: how to

Notebook Data Format

Notebook Data Format

Back to our issue to solve Anything that can possibly go wrong, does. Murphy s Law, 1952

Why that?

First Idea (very early stage) run the heavy computation using the write API to add a new cell to the notebook and that s it. Drawbacks: No interactivity No way to auto-refresh the content y to check for existing

Try to see if there s any already existing solution to this! Take away: avoid reinventing the wheel!

%run to the rescue (?) ipython.org/ipython-doc/3/interactive/magics.html#magic-run

%run

Test in the notebook

A bit more complicated

test it! Blocking Call No interactivity

runipy to the rescue (?) https://github.com/paulgb/runipy

runipy to the rescue (?) https://github.com/paulgb/runipy

A closer look

A closer look

Notebook Runner

runipy features (+) Notebook APIs (+) Kernel Protocol Messaging (+) Support for multiple document formats nbformat.versions (-) No interactivity (-) No support for online/non-blocking execution (~) No support for multi-processing

Idea: try to borrow some code from runipy and re-implement it as an IPython Magic (w/ steroids)

But if you : Hangs on protocol communication and it has no link with the current shell

[%]%async_run an IPython notebook* magic for asynchronous (code) cell execution What I learned during my adventures in the world of Jupyter, Multiprocessing and Asynchronous I/O

IPython is based on Tornado!

Reference

Client-side

Limitations and Future Works Pickle/JSON Serialisation Dependency Major Flaw of Python Multiprocessing Module Try to use dill multiprocessing* Improve the infrastructure to handle errors not really handled yet apart from IPython/JS Integration

Not going to be any demo Due to aforementioned Murphy s Laws :P

Thanks a lot for your kind attention @leriomaggio valeriomaggio@gmail.com +ValerioMaggio it.linkedin.com/in/valeriomaggio