Kepler Scientific Workflow and Climate Modeling Ufuk Turuncoglu Istanbul Technical University Informatics Institute Cecelia DeLuca Sylvia Murphy NOAA/ESRL Computational Science and Engineering Dept. NESII 03/06/2014, IS-ENES Workshop on Workflows in Earth System Modeling
Outline Motivation Our Approach CCSM Workflow Components Provenance and Metadata Collection Use Case Basic CCSM configuration (all components are active) Computing Platform: TeraGrid and Cluster (OTP) Experiences Plans
Common Problems in Earth System Modeling Changing the science in complex ESMs Shift from standalone model components (atm, ocn, lnd etc.) to multi-component ESMs (high resolution in time and space) Hard to use modeling systems (build and run in numerous kind of computational resources) Increased volume of data (hard to transfer, store and process) Increasing needs to test the models (especially in development stage) regularly using standardized test cases to find bugs, check the performance and test the integrity of the modeling system Increasing needs to reproduce the results (in different computational platform) Involve numerous parameter changes that are hard to record and track (ensemble runs, model sensitivity
Common Problems in Earth System Modeling Changing the science in complex HPCs Involve many technologies each with its own learning curve (operating systems, job schedulers, parallel file systems, programming and usage of GPU systems) It is hard and time consuming to develop and test the ESMs in HPC systems (different authentication systems, data transfer protocols and software variants compilers, libraries etc.)
Proposed Approach Integrating ESMs with Scientific Workflow (SWF) Applications The SWF application can act as an abstraction layers (hides details of the different technologies computing environment) It could orchestrate underlying ESMs, pre/postprocessing tools and data repositories It could collect metadata and provenance information in a standardized and automated way It creates a work environment that facilitates easy and standardized information exchange in a efficient way Prototype System (collaboration with NOAA, ESMF Team) Create self-describing earth system model (NCAR s CESM)
Components of Prototype Application The prototype system is tested in both grid environment (TeraGrid) and conventional cluster (NCAR s Bluefire - SSO)
Kepler WF Kepler WF and Modifications Open source, platform independent (Java), supports different model of computations (MoC) using directors, modular (new modules can be designed using Java) Modifications: new module earth Grid based Actors CESM Actors Post-processing Actors
CCSM4 (= CESM) is a multi-component global ESM and which makes it complicated to collect provenance and metadata information. The main output format is XML Provenance / Metadata Collection Tool Hierarchical (Multi-Layered) Approach
Conceptual CCSM Workflow
Kepler CCSM Workflow: Teragrid build model collect provenance info creates WS-GRAM XML job description files automatically run model
Kepler CCSM Workflow: Cluster (OTP) build model run model collect provenance info workflow wide global parameters Modified WF from TeraGrid to
Examples of Collected Provenance Information System Layer: Information related with OS Perl OS type, patch version and level, kernel This script also responsible to define environment variables for the commands that are triggered by workflow.
Examples of Collected Provenance Information Application Layer: ESMF Attribute Class Information related with exchange fields. It can be integrated in model (atm, ocn, ) and/or driver level
Examples of Collected Provenance Information Application Layer: Information related to build stage A Python script Compiler type and version, used compiler flags, list of environment variables and their values
Follow-up Workflows The follow-up workflow is triggered by another workflow Modified Kepler loader Job description in XML form
Lesson Learned in Prototype WF Design Was Kepler scientific WF a good choice? Easy to develop new actors (Java - Object Oriented Design) Different director options (sequential, parallel) The provenance module is not strong There is no any mechanisms to keep track workflow versions (SVN, Git type approach needed) The checkpoint /restart mechanism? Integrating model components with WFs is not easy Models have non-standardized interfaces (i.e. ASCII formatted configuration files or namelists) (CESM uses XML ) The current approach needs to develop model specific actors to configure, build and run the model It is not a generic solution and sensitive to change in
Lesson Learned in Prototype WF Design WFs and integration with computing platforms Job submission, management and monitoring via various type of job scheduler (LSF, OpenPBS etc.) and different authentication systems (OTP, VPN etc.) A set of global environment variables (TG_CLUSTER_HOME, TG_CLUSTER_PFS etc.) that point user scratch, temp and archive directories might help to create generic workflows Controlling whole environment (model, computing system, processing and visualization tools) from the external WF systems are not efficient (Web Services - client / server?) In general, WF needs continuous interaction with computing platform (monitoring, error checking etc.) Creating end-to-end WF system for ESS is a giant task. It needs more collaboration!
What might be needed in the future? We need to have a new approach (just thoughts ) Integrating models with standardized web services (WS) and hooking them with the WS actors in workflow applications The client can be simplified by this way. Don t need to create custom actors for all model and external tools Kepler has a set of WS actors. They can be used to interact with SOAP services The ESMF web services extension (ESMF-WS) and OpenMI might help to create web services enables model components http://www.teragridforum.org/mediawiki/index.php?title=gateway_workflow_survey
What might be needed in the future? We need to have a new approach (just thoughts ) Design of Web-based WF client (HTML5, Draw2D, )? It can be used as a centralized work and monitoring environment. In-situ visualization approach can be an option to monitor ongoing simulations (i.e. VTK based Catalyst ) A new provenance collection tool / mechanism might be designed using guideline of Metafor CIM Along with the help of ontologies - RDF, OWL It can be tightly integrated to the used ESS (ESMF Attribute type approach) Interaction with ESGF type data portals are crucial (model inputs, data processing and visualization workflows etc.)
More Information
Questions! Contact: Ufuk Utku Turunçoğlu u.utku.turuncoglu@be.itu.edu.tr