Putting the Archives to Work: Workflow and Metadata-driven Analysis in LTER Science

Similar documents
Generating EML from a Relational Database Management System (RDBMS)

Wade Sheldon. Georgia Coastal Ecosystems LTER University of Georgia CUAHSI Virtual Workshop Field Data Management Solutions

Georgia Coastal Ecosystems LTER Information Management

GCE Data Toolbox for MATLAB An Introduction. Wade Sheldon Georgia Coastal Ecosystems LTER

Wade Sheldon. Georgia Coastal Ecosystems LTER University of Georgia

Dynamic, Rule-based Quality Control Framework for Real-time Sensor Data

Florida Coastal Everglades LTER Program

Report to the IMC EML Data Package Checks and the PASTA Quality Engine July 2012

International Multidisciplinary Metadata Workshop 18 January Rebecca Koskela Arctic Region Supercomputing Center

Managing Ecological and Biodiversity Data Using Ecoinformatics: Taiwan Experience. Chau Chin Lin Taiwan Forestry Research Institute

Using XML-encoded Metadata as a Basis for Advanced Information Systems for Ecological Research

THE VOEIS HIS GATEWAY. A REST Interface for HydroServer using ODM 1.1

Site# Date H20 Temperature Conductance Turbidity KRS Sep KRS Aug KRS Aug

2006 LNO Annual Report to the Executive Board

A COASTAL WATER QUALITY METADATA DATABASE FOR THE SOUTHEAST U.S.A.

Annotation in EML 2.2. knb. EML Dev Committee 2018

When using this architecture for accessing distributed services, however, query broker and/or caches are recommendable for performance reasons.

Network Information System. NESCent Dryad Subcontract (Year 1) Metacat OAI-PMH Project Plan 25 February Mark Servilla

Title: Interactive data entry and validation tool: A collaboration between librarians and researchers

DataONE Enabling Cyberinfrastructure for the Biological, Environmental and Earth Sciences

A USER S GUIDE TO REGISTERING AND MAINTAINING DATA SERVICES IN HIS CENTRAL 2.0

Curation module in action - its preliminary findings on VLO metadata quality

West Coast Observation Project. West Coast Observing System Project Brief

Jeffery S. Horsburgh. Utah Water Research Laboratory Utah State University

A Data Management Plan Template for Ecological Restoration and Monitoring

LifeWatch/EnvEurope User Forum Use Case Ecology

The Community Data Portal and the WMO WIS

How to use Water Data to Produce Knowledge: Data Sharing with the CUAHSI Water Data Center

Data Curation Practices at the Oak Ridge National Laboratory Distributed Active Archive Center

Title Vega: A Flexible Data Model for Environmental Time Series Data

Workflow Exchange and Archival: The KSW File and the Kepler Object Manager. Shawn Bowers (For Chad Berkley & Matt Jones)

EMC Documentum xdb. High-performance native XML database optimized for storing and querying large volumes of XML content

From Open Data to Data- Intensive Science through CERIF

Interoperability ~ An Introduction

A detailed comparison of EasyMorph vs Tableau Prep

Paraben s Network Examiner 7.0 Release Notes

Data Entry, and Manipulation. DataONE Community Engagement & Outreach Working Group

12/6/2012. Getting Started with Metadata. Presenter. Making Metadata Work. Overall Topics. Data Collection. Topics

Talend Open Studio for MDM Web User Interface. User Guide 5.6.2

The key objectives for this session are:

Introduction to Geodatabase and Spatial Management in ArcGIS. Craig Gillgrass Esri

Making Metadata Work. Using metadata to document your science. August 1 st, 2010

A High-Level Distributed Execution Framework for Scientific Workflows

Axon Fixed Limitations... 1 Known Limitations... 3 Informatica Global Customer Support... 5

Medici for Digital Cultural Heritage Libraries. George Tsouloupas, PhD The LinkSCEEM Project

DataONE: Open Persistent Access to Earth Observational Data

Reproducible & Transparent Computational Science with Galaxy. Jeremy Goecks The Galaxy Team

University at Buffalo's NEES Equipment Site. Data Management. Jason P. Hanley IT Services Manager

Engaging and Connecting Faculty:

The Portal Aspect of the LSST Science Platform. Gregory Dubois-Felsmann Caltech/IPAC. LSST2017 August 16, 2017

Dryad Curation Manual, Summer 2009

Key cyberinfrastructure elements implemented as RESTful webservices

Description of CORE Implementation in Java

PYRAMID Headline Features. April 2018 Release

Building a Linked Open Data Knowledge Graph Henning Schoenenberger Michele Pasin. Frankfurt Book Fair 2017 October 11, 2017

Oliver Engels & Tillmann Eitelberg. Big Data! Big Quality?

The Butterfly Effect. A proposal for distribution and management for butterfly data programs. Dave Waetjen SESYNC Butterfly Workshop May 10, 2012

Long-term preservation for INSPIRE: a metadata framework and geo-portal implementation

Crossing the Archival Borders

Groovy in Jenkins. Ioannis K. Moutsatsos. Repurposing Jenkins for Life Sciences Data Pipelining

The Now Platform Reference Guide

The Curator s Approach to Data Management and Sustainability

Growing Variety and Volume of Remote Sensing and In Situ Data

Overview. Scientific workflows and Grids. Kepler revisited Data Grids. Taxonomy Example systems. Chimera GridDB

Oracle Big Data Science IOUG Collaborate 16

Instant Messaging Interface for Data Distribution Service

Data Governance for the Connected Enterprise

Oracle Big Data Cloud Service, Oracle Storage Cloud Service, Oracle Database Cloud Service

Sessions 3/4: Member Node Breakouts. John Cobb Matt Jones Laura Moyers 7 July 2013 DataONE Users Group

Discovery Net : A UK e-science Pilot Project for Grid-based Knowledge Discovery Services. Patrick Wendel Imperial College, London

Welcome to the Gathering Intelligence from your Applications and Data: The case for Oracle BI eseminar

Semantic Web: Core Concepts and Mechanisms. MMI ORR Ontology Registry and Repository

Windsor Essex Environmental Metadata System (WEEMS)

Database Systems: Design, Implementation, and Management Tenth Edition. Chapter 14 Database Connectivity and Web Technologies

SAS 9.2 Foundation Services. Administrator s Guide

Framework for Building Collaborative Research Environment

Towards Semantically-enabled Exploration and Analysis of Environmental Ecosystems

Future Trends of ILS

Know Your Customer. c360 Microsoft Dynamics CRM 4.0 Product Catalog

IBM TRIRIGA Application Platform Version 3 Release 4.0. Connector User Guide

Metadata for Data Discovery: The NERC Data Catalogue Service. Steve Donegan

Paraben Examiner 9.0 Release Notes

Smart Federated Search for Egyptian Knowledge Bank

Fusion Registry 9 SDMX Data and Metadata Management System

EarthCube and Cyberinfrastructure for the Earth Sciences: Lessons and Perspective from OpenTopography

Software + Services for Data Storage, Management, Discovery, and Re-Use

Copyright 2008, Paul Conway.

SEXTANT 1. Purpose of the Application

Standards, GML and AIXM. Dr. David Burggraf Vice President Galdos Systems Inc

Metadata Models for Experimental Science Data Management

Improving Data Discovery in Metadata Repositories through Semantic Search

The Model-Driven Semantic Web Emerging Standards & Technologies

Provenance-aware Faceted Search in Drupal

ACCELERATE YOUR SHAREPOINT ADOPTION AND ROI WITH CONTENT INTELLIGENCE

Techno Expert Solutions An institute for specialized studies! Introduction to Advance QTP course Content

Hawaii Energy and Environmental Technologies (HEET) Initiative

ArcGIS for Server Michele Lundeen

The Logical Data Store

INSPIRE roadmap and architecture: lessons learned INSPIRE 2017

Integrated Map Tool. Overview, Current Status, and Things to Come

Transcription:

Putting the Archives to Work: Workflow and Metadata-driven Analysis in LTER Science Wade Sheldon Georgia Coastal Ecosystems LTER University of Georgia Acknowledgements: John Porter (Virginia Coast Reserve LTER) Duane Costa (LTER Network Office) Corinna Gries (North Temperate Lakes LTER) Inigo San Gil (LTER Network Office, McMurdo Dry Valleys LTER)

Background Ecological science in the LTER Network is a dataintensive effort covering vast temporal and spatial scales The practice of Informatics is critical for Managing LTER data for analysis Curating LTER data for accuracy and accessibility Archiving LTER data for interpretation and use by future scientists LTER sites have adopted many informatics standards and practices to meet these goals, including EML metadata and PASTA LTER EML implementation targets data integration, not just resource description

EML Metadata in LTER Integration-level EML is comprehensive Data discovery metadata Title, abstract, keywords, personnel Research context metadata Study description, methods, protocols, project description Data set coverage metadata Temporal, spatial, taxonomic Physical metadata for entities (e.g. tables) File format, delimiters, terminators, header format Download URL Attribute metadata Data types, names, descriptions, units, codes, Q/C limits Supports software-mediated discovery, download, parsing of entities and integration with other data

EML Generation LTER sites have developed many approaches for generating EML from site catalogs Morpho editor XML editors (oxygen, XML Spy) Custom application frameworks/databases Two software systems are emerging as community tools used at multiple sites and outside LTER Metabase Metadata Management System (Metabase) Drupal Ecological Information Management System (DEIMS)

Metabase MMS Generalized RDBMS for managing environmental metadata (GCE 2002) Personnel Site geography (study area polygons, point locations) Instrumentation Research Projects Data sets (studies, methods, entities, attributes, files) Linked to Bibliographic and Taxonomic DBs Supports automatic cross-links between people/research/pubs and data RESTful web services for mapping, personnel, data set descriptions Automated metadata generation for data sets, cross-links between all related information Used by GCE, CWT, MCR, SBC, SREL (HBR adopting) http://gce-lter.marsci.uga.edu/public/app/resource_details.asp?id=434

DEIMS IMS built on the popular Drupal CMS framework (LNO 2010) A Drupal installation profile for storing, editing, and sharing data and information about biological and ecological research Provides user-friendly forms to describe all contextual information about your data Produces EML metadata to register data with metadata clearinghouses (LTER PASTA, ORNL-DAAC, KNB) Allows you to query external databases using the Data Explorer feature Used by MCM, SEV, JRN, LUQ, U. Michigan Biological Station and others (https://www.drupal.org/project/deims)

Metadata-driven Analysis PASTA has simplified using EML-described data for metadata-driven analysis and workflows Unified repository for LTER data Quality checks ensure data-metadata conformity API for versioned metadata, data retrieval Trigger mechanism for running workflows on changes Variety of workflow tools in use Kepler R, SAS, SPSS statistical software MATLAB technical computing software GCE Data Toolbox (MATLAB-based framework) Web services (VCR) enabled on LTER Data Portal for generating data loaders for R, SAS, SPSS, MATLAB PASTA/EML support included in GCE Data Toolbox for interactive and programmatic data mining and workflows

Kepler Kepler supports data downloading via REST URLs EML metadata actor loads entities, configures ports for attributes Demo workflows for downloading PASTA data, ClimDB export Demo: http://intranet2.lternet.edu/content/video-and-presentations-2012-nsf-lter-mini-symposium-now-available

R, SAS, SPSS and MATLAB EML transformed via XSLT to generate native data acquisition programs for target platform (documented source code) Based on R stat program generator from TERN (I-LTER) Programs download compatible entities, load data into appropriate structures from within analysis environment Attribution and research origin metadata included as code comments (R, SAS, SPSS) or in data structure itself (MATLAB) Flexible can be run locally or via RESTful web services Many benefits to users: EML, source code and entities (e.g. CSV files) can be saved, re-used User can debug code using native IDE if incompatible data features Generated code can be modified and extended as part of a custom workflow Leverages tools researchers use every day!

R, SAS, SPSS, MATLAB http://www.vcrlter.virginia.edu/webservice/pastaprog/knb-lter-vcr.26.20.r

R, SAS, SPSS, MATLAB http://www.vcrlter.virginia.edu/webservice/pastaprog/knb-lter-vcr.26.20.m

R, SAS, SPSS, MATLAB Running code downloads files, loads data

LTER Data Portal Web Services Links to code generator services on summary page for every LTER data set in PASTA

LTER Data Portal Web Services Provides code (copy/paste or download) and instructions

GCE Data Toolbox MATLAB framework for metadata-based processing, quality control and analysis of environmental data (see ESIP poster) Imports EML-described data from local file system, PASTA, KNB, site catalogs,... Leverages generic EML-to-MATLAB XSLT Complete metadata imported along with entities into data model Supports authentication, entity selection (GUI or workflow)

GCE Data Toolbox Wide range of data visualization, transformation, integration tools for developing workflows Derived data contain complete metadata, QA/QC information, processing history Export results to text files, MATLAB arrays, RDBMS, CUAHSI ODM, KML, HTML EML and text files can be uploaded to PASTA Open source software library for MATLAB: https://gce-svn.marsci.uga.edu/trac/gce_toolbox

GCE Data Toolbox

Conclusion Currently just scratching the surface of what EML and PASTA can do Vision of machine-readable data and metadataenabled analysis and integration now a reality Use cases and collaborations inside and outside LTER welcome