Hands-on tutorial on usage the Kepler Scientific Workflow System

Similar documents
Kepler scientific workflow orchestrator as a tool to build the computational workflows.

Scientific Workflow Tools. Daniel Crawl and Ilkay Altintas San Diego Supercomputer Center UC San Diego

Approaches to Distributed Execution of Scientific Workflows in Kepler

Accelerating the Scientific Exploration Process with Kepler Scientific Workflow System

Astrophysics and the Grid: Experience with EGEE

Integrated Machine Learning in the Kepler Scientific Workflow System

Kepler and Grid Systems -- Early Efforts --

The Astro Runtime. for data access. Noel Winstanley Jodrell Bank, AstroGrid. with the part of Noel played by John Taylor, IfA Edinburgh/AstroGrid

INDIGO AAI An overview and status update!

San Diego Supercomputer Center, UCSD, U.S.A. The Consortium for Conservation Medicine, Wildlife Trust, U.S.A.

g-eclipse A Framework for Accessing Grid Infrastructures Nicholas Loulloudes Trainer, University of Cyprus (loulloudes.n_at_cs.ucy.ac.

Eclipse Technology Project: g-eclipse

CloudCenter for Developers

INDIGO-DataCloud Architectural Overview

TOSCA Templates for NFV and network topology description

MATLAB. Senior Application Engineer The MathWorks Korea The MathWorks, Inc. 2

N. Marusov, I. Semenov

THE EUCLID ARCHIVE SYSTEM: A DATA-CENTRIC APPROACH TO BIG DATA

Accelerate with IBM Storage Webinar: VersaStack IBM s On-Premise Private Cloud Solutions

THE EUCLID ARCHIVE SYSTEM: A DATA-CENTRIC APPROACH TO BIG DATA

A Distributed Data- Parallel Execu3on Framework in the Kepler Scien3fic Workflow System

Kepler Scientific Workflow and Climate Modeling

INDIGO PAAS TUTORIAL. ! Marica Antonacci RIA INFN-Bari

A High-Level Distributed Execution Framework for Scientific Workflows

TECHNICAL BRIEF. Scheduling and Orchestration of Heterogeneous Docker-Based IT Landscapes. January 2017 Version 2.0 For Public Use

Kepler: An Extensible System for Design and Execution of Scientific Workflows

irods usage at CC-IN2P3: a long history

irods for Data Management and Archiving UGM 2018 Masilamani Subramanyam

Digital Curation and Preservation: Defining the Research Agenda for the Next Decade

Co-ReSyF Hands-on sessions

VIRTUAL OBSERVATORY TECHNOLOGIES

Using Web Services and Scientific Workflow for Species Distribution Prediction Modeling 1

Dynamic, Rule-based Quality Control Framework for Real-time Sensor Data

USING THE BUSINESS PROCESS EXECUTION LANGUAGE FOR MANAGING SCIENTIFIC PROCESSES. Anna Malinova, Snezhana Gocheva-Ilieva

Case Study: CyberSKA - A Collaborative Platform for Data Intensive Radio Astronomy

The National Fusion Collaboratory

Cloud interoperability and elasticity with COMPSs

Distributed Archive System for the Cherenkov Telescope Array

Storage Management in INDIGO

Luigi Build Data Pipelines of batch jobs. - Pramod Toraskar

Data 101 Which DB, When. Joe Yong Azure SQL Data Warehouse, Program Management Microsoft Corp.

Multi-Cloud and Application Centric Modeling, Deployment and Management with Cisco CloudCenter (CliQr)

Ryan Rich - Software Developer & Architect

Using AMNS data within an Integrated Tokamak Modelling Environment

RESO s GitHub Launch with new WebAPI Adapter

Open Material Property Library With Native Simulation Tool Integrations MASTO

Dbvisit Product and Company Website Copy

Managing Petabytes of data with irods. Jean-Yves Nief CC-IN2P3 France

First European Globus Community Forum Meeting

REALIZE YOUR. DIGITAL VISION with Digital Private Cloud from Atos and VMware

Migrating Applications with CloudCenter

Scripting without Scripts: A User-Friendly Integration of R, Python, Matlab and Groovy into KNIME

Fit für die MATLAB EXPO

Technical Computing with MATLAB

vrealize Automation, Orchestration and Extensibility

Requirements for data catalogues within facilities

Workflow Fault Tolerance for Kepler. Sven Köhler, Thimothy McPhillips, Sean Riddle, Daniel Zinn, Bertram Ludäscher

The Virtual Observatory and the IVOA

Grid Scheduling Architectures with Globus

Understanding the latent value in all content

ICIT. Brian Hiller ESRI Account Manger. What s new in ArcGIS 10

Continuous Delivery for Cloud Native Applications

Medici for Digital Cultural Heritage Libraries. George Tsouloupas, PhD The LinkSCEEM Project

One Platform Kit: The Power to Innovate

How to build Scientific Gateways with Vine Toolkit and Liferay/GridSphere framework

Virtual Observatory publication of interferometry simulations

How the INDIGO-DataCloud computing platform aims at helping scientific communities

ELFms industrialisation plans

DevNet Workshop-Hands-on with CloudCenter and Jenkins

Go Faster: Containers, Platforms and the Path to Better Software Development (Including Live Demo)

Automating Security Practices for the DevOps Revolution

Usage of the Astro Runtime

Petr Suchomel Architect, NetBeans Mobility

Advanced School in High Performance and GRID Computing November Introduction to Grid computing.

Containerization Dockers / Mesospere. Arno Keller HPE

Easy Access to Grid Infrastructures

Cloud Open Source Innovation on Software Defined Storage

Europlanet IDIS: Adapting existing VO building blocks to Planetary Sciences

Pegasus Workflow Management System. Gideon Juve. USC Informa3on Sciences Ins3tute

2/12/11. Addendum (different syntax, similar ideas): XML, JSON, Motivation: Why Scientific Workflows? Scientific Workflows

More AWS, Serverless Computing and Cloud Research

Hybrid Cloud Solutions

Multi-Cloud and Application Centric Modeling, Deployment and Management with Cisco CloudCenter (CliQr)

CrossGrid testbed status

Mitigating Risk of Data Loss in Preservation Environments

ACET s e-research Activities

Analytics in the Cloud Mandate or Option?

Integrate MATLAB Analytics into Enterprise Applications

Beyond X.509: Token-based Authentication and Authorization with the INDIGO Identity and Access Management Service

Microsoft vision for a new era

Isomorphic Kotlin. Troy

Docker and Oracle Everything You Wanted To Know

XSEDE s Campus Bridging Project Jim Ferguson National Institute for Computational Sciences

The LHC Computing Grid

EMC Hybrid Cloud. Umair Riaz - vspecialist

Provisioning Intel Rack Scale Design Bare Metal Resources in the OpenStack Environment

Outline. S: past, present and future Some thoughts. The 80s. Interfaces - 60s & 70s. Duncan Temple Lang Department of Statistics UC Davis

Big Data Applications using Workflows for Data Parallel Computing

Event Driven network automation

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Transcription:

Hands-on tutorial on usage the Kepler Scientific Workflow System (including INDIGO-DataCloud extension) RIA-653549 Michał Konrad Owsiak (@mkowsiak) Poznan Supercomputing and Networking Center michal.owsiak@man.poznan.pl e-research Summer Hackfest

Agenda Kepler overview Main Kepler's concepts Examples of Kepler application into science Hands on tutorials

After this workshop You will understand: general idea of Kepler how to build simple workflow how to build control statements how to access Python from Kepler how Kepler contributes to INDIGO-DataCloud

Essential info about Kepler Scientific Workflow System Built on top of Ptolemy II framework Actor based computations Provides different models of execution (via Directors) Supported by NSF-funded team: UC Davis UC Santa Barbara UC San Diego Used across different fields of science: ecology, engineering, geology, physics, bioinformatics, biology, nuclear fusion, astrophysics, nanotechnology, climate change, etc.

Kepler application (GUI) Looks nice, isn't it?!

Kepler - built-in components Around 500 components ready to use (~1 000 000 LOC) mathematics: functions, operators, actors dedicated for R/Matlab visualization: plotters, graphs, image displays data manipulation: primitive types, strings, tables, records, GeoData, GIS, Data Access Protocol (DAP) 2.0 data access: support for Oracle, MySQL, MS Access, MS SQL, PostgreSQL http://kepler-project.org

Kepler - built-in components Web-Services: SOAP, REST, XML processing executing external code: SSH, Cloud, Hadoop, UNICORE, glite, Nimrod, Serpens Kepler as batch job field specific modules: BioKepler, GAMESS Input generator, Sensor Processing and Acquisition Network provenance module: you can track workflow's execution

Support for various languages main language: scripting: Python: C/C++: Fortran: Write once, run anywhere! Java Groovy, JavaScript, Jython external process, called via JNI external process, can be turned into actor can be called via JNI (or via JNA) These guys help you call native code from Java

Various execution models Execution models: Sequential Data Flow Dynamic Data Flow Process Network Continuous Time These two are our main concern today!

Building workflows Kepler provides set of building blocks you can use: workflow director actor port relation

Building workflow workflow allows to compose data flow between elements Drag'n'Drop to put elements on canvas workflow can be started in GUI or from CLI

Building workflow

Building workflow Essential components are available from tool bar

Building components

Directors Directors orchestrate the execution of workflow SDF - Sequential Data Flow DDF - Dynamic Data Flow PN - Process Network CT - Continuous Time

Actor - basic building block

Actor - basic building block You can: use existing actors build your own actors (Java, Python, Groovy) import actors from other people install whole modules

Composite actors

Use-cases Use-cases

Nuclear Fusion - EFDA ITM-TF/EUROfusion Integration of simulation tools for ITER and DEMO plasmas Kepler as workflow platform Numerous user codes: Fortran, C++, C, Python, Java, Matlab Common database for information exchange 90+ users, workflow developers and support team Custom build system: we extend default version of Kepler 2.5 with custom components Version tracking: workflows, actors, Kepler, results

Nuclear Fusion - EFDA ITM-TF/EUROfusion

Nuclear Fusion - EFDA ITM-TF/EUROfusion

Nanotechnology A software suite - ANELLI has been designed with the aim of performing ring analysis Requirements: easy way to add new actors easy way to parametrise and trigger components possibility to update codes live running in distributed environment (parametric scans) workflows should be transparent for users

ANELLI workflow

Astronomy Several workflows available (DIAPL, BF) Based on IVOA (International Virtual Observatory Alliance) standards The goal is to create modules that can be used by multiple groups of astronomers support for VO services (SAMP, ds9, Topcat) workflows should provide some level of interaction integration with irods (Integrated Rule-Oriented Data System) integration of numerous codes user friendly GUI

Astronomy - DIAPL workflow

INDIGO-DataCloud H2020 project - approved in January 2015 26 European partners, 11 European countries development of open source cloud platform for computing and data storage - DataCloud targets multi - disciplinary scientific communities deployable on hybrid Cloud infrastructures (public or private) answer to call raised by increasing demand for easy access to distributed Cloud/Grid resources

INDIGO-DC module for Kepler indigokepler - module that uses INDIGO-DC PaaS services First version of module is available; gradually extended Based on FutureGateway API Available via Docker image Sources are available in GitHub indigoclient indigokepler docker-kepler - https://github.com/indigo-dc/indigoclient - https://github.com/indigo-dc/indigokepler - https://github.com/indigo-dc/docker-kepler

Indigo - DataCloud https://youtu.be/sedbzfzjrve

Questions? Questions?

Hands on part - Docker Kepler "For the things we have to learn before we can do them, we learn by doing them." - Aristotle http://goo.gl/3rurx7