Kepler Scientific Workflow and Climate Modeling

Similar documents
Introduction to Regional Earth System Model (RegESM)

RegCM-ROMS Tutorial: Coupling RegCM-ROMS

European and international background

The Earth System Modeling Framework (and Beyond)

Building a Global Data Federation for Climate Change Science The Earth System Grid (ESG) and International Partners

CESM2 Software Update. Mariana Vertenstein CESM Software Engineering Group

RegCM-ROMS Tutorial: Introduction to ROMS Ocean Model

End-to-end optimization potentials in HPC applications for NWP and Climate Research

Continuous integration & continuous delivery. COSC345 Software Engineering

A Software Developing Environment for Earth System Modeling. Depei Qian Beihang University CScADS Workshop, Snowbird, Utah June 27, 2012

Index Introduction Setting up an account Searching and accessing Download Advanced features

Scientific Workflow Tools. Daniel Crawl and Ilkay Altintas San Diego Supercomputer Center UC San Diego

Accelerating the Scientific Exploration Process with Kepler Scientific Workflow System

2/12/11. Addendum (different syntax, similar ideas): XML, JSON, Motivation: Why Scientific Workflows? Scientific Workflows

Grid-Based Data Mining and the KNOWLEDGE GRID Framework

CESM Projects Using ESMF and NUOPC Conventions

Getting up and running with CESM Cécile Hannay Climate and Global Dynamics (CGD), NCAR

Chapter 4:- Introduction to Grid and its Evolution. Prepared By:- NITIN PANDYA Assistant Professor SVBIT.

Cloud Computing. Up until now

Provenance Manager: PROV-man an Implementation of the PROV Standard. Ammar Benabadelkader Provenance Taskforce Budapest, 24 March 2014

Common Infrastructure for Modeling Earth (CIME) and MOM6. Mariana Vertenstein CESM Software Engineering Group

Belle II - Git migration

Hands-on tutorial on usage the Kepler Scientific Workflow System

Digital Curation and Preservation: Defining the Research Agenda for the Next Decade

Knowledge Discovery Services and Tools on Grids

Kepler and Grid Systems -- Early Efforts --

Uniform Resource Locator Wide Area Network World Climate Research Programme Coupled Model Intercomparison

CSinParallel Workshop. OnRamp: An Interactive Learning Portal for Parallel Computing Environments

CoG: The NEW ESGF WEB USER INTERFACE

The CIME Case Control System

Continuous Integration (CI) with Jenkins

Science-as-a-Service

NCAR Globally Accessible Data Environment (GLADE) Updated: 15 Feb 2017

How to build Scientific Gateways with Vine Toolkit and Liferay/GridSphere framework

Introduction of NAREGI-PSE implementation of ACS and Replication feature

Scientific Workflows

InfraStructure for the European Network for Earth System modelling. From «IS-ENES» to IS-ENES2

Where we are so far. Intro to Data Integration (Datalog, mediators, ) more to come (your projects!): schema matching, simple query rewriting

Remote Workflow Enactment using Docker and the Generic Execution Framework in EUDAT

CESM Workflow Refactor Project Land Model and Biogeochemistry Working Groups 2015 Winter Meeting CSEG & ASAP/CISL

Regular Forum of Lreis. Speechmaker: Gao Ang

ESMF. Earth System Modeling Framework. Carsten Lemmen. Schnakenbek, 17 Sep /23

- C3Grid Stephan Kindermann, DKRZ. Martina Stockhause, MPI-M C3-Team

ACME Exploratory Analysis and Classic Diagnostics Viewer

Scientific Software Development with Eclipse

Application Development and Deployment With MATLAB

San Diego Supercomputer Center, UCSD, U.S.A. The Consortium for Conservation Medicine, Wildlife Trust, U.S.A.

Prototyping an in-situ visualisation mini-app for the LFRic Project

Team-Based Collaboration in Simulink Chris Fillyaw Application Engineer Detroit, MI

Tools and Services for Distributed Knowledge Discovery on Grids

Current Progress of Grid Project in KMA

CCSM Performance with the New Coupler, cpl6

Python ecosystem for scientific computing with ABINIT: challenges and opportunities. M. Giantomassi and the AbiPy group

Long-term preservation for INSPIRE: a metadata framework and geo-portal implementation

RDF and Digital Libraries

Inca as Monitoring. Kavin Kumar Palanisamy Indiana University Bloomington

AMGA metadata catalogue system

Running the model in production mode: using the queue.

Remote & Collaborative Visualization. Texas Advanced Computing Center

A Dream of Software Engineers -- Service Orientation and Cloud Computing

Using the Sakai Collaborative Toolkit in e-research Applications

CESM Tutorial. NCAR Climate and Global Dynamics Laboratory. CESM 2.0 CESM1.2.x and previous (see earlier tutorials) Alice Bertini

RENKU - Reproduce, Reuse, Recycle Research. Rok Roškar and the SDSC Renku team

Geant4 in a Distributed Computing Environment

Software Infrastructure for Data Assimilation: Object Oriented Prediction System

Web Interface to Materials Simulations

climate4impact.eu Christian Pagé, CERFACS

Using Resources of Multiple Grids with the Grid Service Provider. Micha?Kosiedowski

Install your scientific software stack easily with Spack

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

By Ian Foster. Zhifeng Yun

UCT Application Development Lifecycle. UCT Business Applications

HIGH PERFORMANCE COMPUTING (PLATFORMS) SECURITY AND OPERATIONS

Operations Orchestration 10.x Flow Authoring (OO220)

Introduction on Science Gateway

A High-Level Distributed Execution Framework for Scientific Workflows

CODE AND DATA MANAGEMENT. Toni Rosati Lynn Yarmey

A simple OASIS interface for CESM E. Maisonnave TR/CMGC/11/63

NUSGRID a computational grid at NUS

ExArch: Climate analytics on distributed exascale data archives Martin Juckes, V. Balaji, B.N. Lawrence, M. Lautenschlager, S. Denvil, G. Aloisio, P.

Workflow Exchange and Archival: The KSW File and the Kepler Object Manager. Shawn Bowers (For Chad Berkley & Matt Jones)

A More Realistic Way of Stressing the End-to-end I/O System

Introduction to the Europeana SIP CREATOR

Portal Application Deployment Scripting

Linking datasets with user commentary, annotations and publications: the CHARMe project

The Future of ESGF. in the context of ENES Strategy

Hybrid-Type Extensions for Actor-Oriented Modeling (a.k.a. Semantic Data-types for Kepler) Shawn Bowers & Bertram Ludäscher

Reproducible & Transparent Computational Science with Galaxy. Jeremy Goecks The Galaxy Team

OpenSees on Teragrid

Pegasus Workflow Management System. Gideon Juve. USC Informa3on Sciences Ins3tute

ExArch, Edinburgh, March 2014

High Throughput Urgent Computing

Enhancement for bitwise identical reproducibility of Earth system modeling on the C-Coupler platform

Managing Exploratory Workflows

CISL Update. 29 April Operations and Services Division

Using the Eclipse Parallel Tools Platform in Support of Earth Sciences High Performance Computing

Experiences with Porting CESM to ARCHER

South African Science Gateways

The challenge of porting scientific results to operational applications

THE IMPORTANCE OF NICHE TECHNOLOGIES IN BUSINESS ANALYSIS. - Kat Okwera Jan 2019

Transcription:

Kepler Scientific Workflow and Climate Modeling Ufuk Turuncoglu Istanbul Technical University Informatics Institute Cecelia DeLuca Sylvia Murphy NOAA/ESRL Computational Science and Engineering Dept. NESII 03/06/2014, IS-ENES Workshop on Workflows in Earth System Modeling

Outline Motivation Our Approach CCSM Workflow Components Provenance and Metadata Collection Use Case Basic CCSM configuration (all components are active) Computing Platform: TeraGrid and Cluster (OTP) Experiences Plans

Common Problems in Earth System Modeling Changing the science in complex ESMs Shift from standalone model components (atm, ocn, lnd etc.) to multi-component ESMs (high resolution in time and space) Hard to use modeling systems (build and run in numerous kind of computational resources) Increased volume of data (hard to transfer, store and process) Increasing needs to test the models (especially in development stage) regularly using standardized test cases to find bugs, check the performance and test the integrity of the modeling system Increasing needs to reproduce the results (in different computational platform) Involve numerous parameter changes that are hard to record and track (ensemble runs, model sensitivity

Common Problems in Earth System Modeling Changing the science in complex HPCs Involve many technologies each with its own learning curve (operating systems, job schedulers, parallel file systems, programming and usage of GPU systems) It is hard and time consuming to develop and test the ESMs in HPC systems (different authentication systems, data transfer protocols and software variants compilers, libraries etc.)

Proposed Approach Integrating ESMs with Scientific Workflow (SWF) Applications The SWF application can act as an abstraction layers (hides details of the different technologies computing environment) It could orchestrate underlying ESMs, pre/postprocessing tools and data repositories It could collect metadata and provenance information in a standardized and automated way It creates a work environment that facilitates easy and standardized information exchange in a efficient way Prototype System (collaboration with NOAA, ESMF Team) Create self-describing earth system model (NCAR s CESM)

Components of Prototype Application The prototype system is tested in both grid environment (TeraGrid) and conventional cluster (NCAR s Bluefire - SSO)

Kepler WF Kepler WF and Modifications Open source, platform independent (Java), supports different model of computations (MoC) using directors, modular (new modules can be designed using Java) Modifications: new module earth Grid based Actors CESM Actors Post-processing Actors

CCSM4 (= CESM) is a multi-component global ESM and which makes it complicated to collect provenance and metadata information. The main output format is XML Provenance / Metadata Collection Tool Hierarchical (Multi-Layered) Approach

Conceptual CCSM Workflow

Kepler CCSM Workflow: Teragrid build model collect provenance info creates WS-GRAM XML job description files automatically run model

Kepler CCSM Workflow: Cluster (OTP) build model run model collect provenance info workflow wide global parameters Modified WF from TeraGrid to

Examples of Collected Provenance Information System Layer: Information related with OS Perl OS type, patch version and level, kernel This script also responsible to define environment variables for the commands that are triggered by workflow.

Examples of Collected Provenance Information Application Layer: ESMF Attribute Class Information related with exchange fields. It can be integrated in model (atm, ocn, ) and/or driver level

Examples of Collected Provenance Information Application Layer: Information related to build stage A Python script Compiler type and version, used compiler flags, list of environment variables and their values

Follow-up Workflows The follow-up workflow is triggered by another workflow Modified Kepler loader Job description in XML form

Lesson Learned in Prototype WF Design Was Kepler scientific WF a good choice? Easy to develop new actors (Java - Object Oriented Design) Different director options (sequential, parallel) The provenance module is not strong There is no any mechanisms to keep track workflow versions (SVN, Git type approach needed) The checkpoint /restart mechanism? Integrating model components with WFs is not easy Models have non-standardized interfaces (i.e. ASCII formatted configuration files or namelists) (CESM uses XML ) The current approach needs to develop model specific actors to configure, build and run the model It is not a generic solution and sensitive to change in

Lesson Learned in Prototype WF Design WFs and integration with computing platforms Job submission, management and monitoring via various type of job scheduler (LSF, OpenPBS etc.) and different authentication systems (OTP, VPN etc.) A set of global environment variables (TG_CLUSTER_HOME, TG_CLUSTER_PFS etc.) that point user scratch, temp and archive directories might help to create generic workflows Controlling whole environment (model, computing system, processing and visualization tools) from the external WF systems are not efficient (Web Services - client / server?) In general, WF needs continuous interaction with computing platform (monitoring, error checking etc.) Creating end-to-end WF system for ESS is a giant task. It needs more collaboration!

What might be needed in the future? We need to have a new approach (just thoughts ) Integrating models with standardized web services (WS) and hooking them with the WS actors in workflow applications The client can be simplified by this way. Don t need to create custom actors for all model and external tools Kepler has a set of WS actors. They can be used to interact with SOAP services The ESMF web services extension (ESMF-WS) and OpenMI might help to create web services enables model components http://www.teragridforum.org/mediawiki/index.php?title=gateway_workflow_survey

What might be needed in the future? We need to have a new approach (just thoughts ) Design of Web-based WF client (HTML5, Draw2D, )? It can be used as a centralized work and monitoring environment. In-situ visualization approach can be an option to monitor ongoing simulations (i.e. VTK based Catalyst ) A new provenance collection tool / mechanism might be designed using guideline of Metafor CIM Along with the help of ontologies - RDF, OWL It can be tightly integrated to the used ESS (ESMF Attribute type approach) Interaction with ESGF type data portals are crucial (model inputs, data processing and visualization workflows etc.)

More Information

Questions! Contact: Ufuk Utku Turunçoğlu u.utku.turuncoglu@be.itu.edu.tr