EarthCube and Cyberinfrastructure for the Earth Sciences: Lessons and Perspective from OpenTopography

Similar documents
A geoinformatics-based approach to the distribution and processing of integrated LiDAR and imagery data to enhance 3D earth systems research

Enhanced Access to High-Resolution LiDAR Topography through Cyberinfrastructure-Based Data Distribution and Processing

PRODUCT BROCHURE ERDAS APOLLO MANAGING AND SERVING GEOSPATIAL INFORMATION

GEOSPATIAL ERDAS APOLLO. Your Geospatial Business System for Managing and Serving Information

Data publication and discovery with Globus

ERDAS APOLLO Managing and Serving Geospatial Information

DataONE: Open Persistent Access to Earth Observational Data

GEON Points2Grid Utility Instructions By: Christopher Crosby OpenTopography Facility, San Diego Supercomputer Center

How to use Water Data to Produce Knowledge: Data Sharing with the CUAHSI Water Data Center

The Common Framework for Earth Observation Data. US Group on Earth Observations Data Management Working Group

A Framework of Information Technology for Water Resources Management

ACCI Recommendations on Long Term Cyberinfrastructure Issues: Building Future Development

Paving the Rocky Road Toward Open and FAIR in the Field Sciences

Migrating to a New Generation of MapInfo

IMAGERY FOR ARCGIS. Manage and Understand Your Imagery. Credit: Image courtesy of DigitalGlobe

Compass INSPIRE Services. Compass INSPIRE Services. White Paper Compass Informatics Limited Block 8, Blackrock Business

Cyberinfrastructure Framework for 21st Century Science & Engineering (CIF21)

The ArcGIS Platform for Managing, Processing, and Sharing UAV Data

A VO-friendly, Community-based Authorization Framework

NEXT GENERATION WHITE PAPER GEOSPATIAL CONTENT MANAGEMENT

Implementation of Web Geoservices by National Cartographic Center *

THE GLOBUS PROJECT. White Paper. GridFTP. Universal Data Transfer for the Grid

Reducing Consumer Uncertainty

Cheshire 3 Framework White Paper: Implementing Support for Digital Repositories in a Data Grid Environment

A Three Tier Architecture for LiDAR Interpolation and Analysis

Electronic Records Archives: Philadelphia Federal Executive Board

Data Virtualization Implementation Methodology and Best Practices

SEXTANT 1. Purpose of the Application

Raster Serving with ArcGIS

International Oceanographic Data and Information Exchange - Ocean Data Portal (IODE ODP)

DataONE Enabling Cyberinfrastructure for the Biological, Environmental and Earth Sciences

Interoperability in Science Data: Stories from the Trenches

ArcGIS Online. The Road Ahead Geoff Mortson

GEOSS Data Management Principles: Importance and Implementation

Developing Similar Applications for Dissimilar Audiences

Jeffery S. Horsburgh. Utah Water Research Laboratory Utah State University

Indiana University Research Technology and the Research Data Alliance

Addressing Geospatial Big Data Management and Distribution Challenges ERDAS APOLLO & ECW

The What, Why, Who and How of Where: Building a Portal for Geospatial Data. Alan Darnell Director, Scholars Portal

Brown University Libraries Technology Plan

GRIDS INTRODUCTION TO GRID INFRASTRUCTURES. Fabrizio Gagliardi

Introduction to INSPIRE. Network Services

Geoffrey Fox Community Grids Laboratory Indiana University

IT Infrastructure for BIM and GIS 3D Data, Semantics, and Workflows

Metadata for Data Discovery: The NERC Data Catalogue Service. Steve Donegan

the steps that IS Services should take to ensure that this document is aligned with the SNH s KIMS and SNH s Change Requirement;

Knowledge-based Grids

3D in the Browser with WebGL. Chris Andrews 3D Product Manager Javier Gutierrez 3D Product Engineer

Software + Services for Data Storage, Management, Discovery, and Re-Use

DRS Update. HL Digital Preservation Services & Library Technology Services Created 2/2017, Updated 4/2017

escience in the Cloud Dan Fay Director Earth, Energy and Environment

Index Introduction Setting up an account Searching and accessing Download Advanced features

SMARTERDECISIONS. Geospatial Portal 2013 Open Interoperable GIS/Imagery Services with ERDAS Apollo 2013 and ERDAS Imagine 2013

Data Curation Handbook Steps

OneUConn IT Service Delivery Vision

WHITE PAPER Cloud FastPath: A Highly Secure Data Transfer Solution

3D in the ArcGIS Platform. Chris Andrews

WInSAR Operations Report

THE API DEVELOPER EXPERIENCE ENABLING RAPID INTEGRATION

Developing Enterprise Cloud Solutions with Azure

ERDAS IMAGINE THE WORLD S MOST WIDELY-USED REMOTE SENSING SOFTWARE PACKAGE

ENVI. Get the Information You Need from Imagery.

THE NATIONAL DATA SERVICE(S) & NDS CONSORTIUM A Call to Action for Accelerating Discovery Through Data Services we can Build Ed Seidel

e-infrastructures in FP7 INFO DAY - Paris

EUDAT. A European Collaborative Data Infrastructure. Daan Broeder The Language Archive MPI for Psycholinguistics CLARIN, DASISH, EUDAT

Mercè Crosas, Ph.D. Chief Data Science and Technology Officer Institute for Quantitative Social Science (IQSS) Harvard

GeoEarthScope NoCAL San Andreas System LiDAR pre computed DEM tutorial

Data Protection for Virtualized Environments

Building on to the Digital Preservation Foundation at Harvard Library. Andrea Goethals ABCD-Library Meeting June 27, 2016

Easily Managing Hybrid IT with Transformation Technology

Achieving the Goals of the DoD Netcentric Data Strategy Using Embarcadero All-Access

Managing and Serving Elevation and Lidar Data. Cody Benkelman UC 2018

The Materials Data Facility

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Oracle Spatial Users Conference

The C3S Climate Data Store and its upcoming use by CAMS

The Integrated Data Viewer A web-enabled enabled tool for geoscientific analysis and visualization

Managing Imagery and Raster Data using Mosaic Datasets

The International Journal of Digital Curation Issue 1, Volume

Developing a Geospatial Search Tool Using a Relational Database Implementation of the FGDC CSDGM Model

Conducting a Self-Assessment of a Long-Term Archive for Interdisciplinary Scientific Data as a Trustworthy Digital Repository

Powering Knowledge Discovery. Insights from big data with Linguamatics I2E

QGIS plugin or web app? Lessons learned from the development of a 3D georeferencer.

Modernizing California State Highway Right of Way Records with GIS. Caltrans District 4 Oakland, CA

The EOC Geoservice: Standardized Access to Earth Observation Data Sets and Value Added Products ABSTRACT

Getting Started with ArcGIS for Server. Charmel Menzel and Ken Gorton

Digital Library on Societal Impacts Draft Requirements Document

Growing Variety and Volume of Remote Sensing and In Situ Data

Web Engineering. Introduction. Husni

MASAS. Overview & Backgrounder Document. Consultation Package. CanOps

Open data and analytics for a sustainable energy future. Version 0.2 January 23 rd, 2017

Using ESML in a Semantic Web Approach for Improved Earth Science Data Usability

EUCOM/AFRICOM DEFENSE USER GROUP MEETING MARCH 2010 STUTTGART WELCOME!

Euro-BioImaging Preparatory Phase II Project

ANNUAL REPORT Visit us at project.eu Supported by. Mission

The Canadian CyberSKA Project

Microsoft SharePoint Server 2013 Plan, Configure & Manage

Web Services for Visualization

Developing a Free and Open Source Software based Spatial Data Infrastructure. Jeroen Ticheler

eresearch Collaboration across the Pacific:

Transcription:

EarthCube and Cyberinfrastructure for the Earth Sciences: Lessons and Perspective from OpenTopography Christopher Crosby, San Diego Supercomputer Center J Ramon Arrowsmith, Arizona State University Chaitan Baru, San Diego Supercomputer Center INTRODUCTION The National Science Foundation- funded OpenTopography Facility (http://www.opentopography.org) is an example of a project that is successfully developing production cyberinfrastructure that serves a diverse community of users. Through the leveraging of emerging cyberinfrastructure technologies such as portal- based data access, service oriented architectures, high- performance database systems, open source software libraries, optimized processing algorithms, and high- performance computing platforms, OpenTopography has dramatically improved internet- based access to massive high- resolution lidar topography datasets. OpenTopography s growing user community of several thousand scientists, educators, students, government agency staff, and private sector users illustrate that cyberinfrastructure- based geospatial data access systems can have a significant impact by democratizing access to these massive data sets. OpenTopography s success is an illustration of the potential opportunities that exist through the application of cyberinfrastructure resources to geoscience data management and processing. With the vision and systems prototyped as an R&D project under the NSF GEON Grid project, OpenTopography also represents an excellent example of the process through which early cyberinfrastructure research efforts are evolved into production, stable and reliable, community resources. DATA AND CHALLENGES High- resolution observations of the Earth s surface, collected with airborne and terrestrial lidar (light detection and ranging), high resolution imagery (satellite and airborne), and multibeam bathymetry technology are revolutionizing the way we study the geomorphic, tectonic, biologic and anthropogenic processes acting along the Earth s surface. These data are fundamental tools for research on a variety of topics ranging from earthquake hazards to urban systems modeling. These data also have significant implications for earth science education and outreach because when visualized, they provide an accurate and detailed three dimensional digital representation of landforms, natural hazards and processes, and the built environment. Repeat measurements can be differenced to assess changes in the surface. However, along with the potential of these data comes an increase in the volume and complexity of data that must be efficiently managed, archived, distributed, processed and integrated in order for them to be of use to the community. These massive datasets are often difficult to manage and poses significant distribution challenges when trying to allow access to the data for a large scientific user 1

community. Furthermore, the data can be technically challenging to work with and may require specific software and computing resources that are not readily available to many users. TIERED DATA ACCESS A key aspect of OpenTopography is a tiered approach to data access (see figure at left). In order to facilitate access to lidar data for users with varying ranges of expertise and computing resources, OpenTopography provides a variety of data products. At one end of the spectrum, expert users can access entire multi- terabyte point cloud datasets, often many billions of individual points, and run a particular processing routine with customized parameter settings. The system also permits users to produce derivative products such as common geomorphic metrics like slope, as well as visualizations of the products. This on- demand access to data and dynamically generated products is enabled through the co- location of raw data with the high- performance processing resources available at the San Diego Supercomputer Center. By leveraging remote data and processing co- location, users are able to explore and optimize data products for their scientific application. In the middle of the spectrum, pre- computed raster data products (DEMs), at resolutions that are optimal to the datasets and organized into manageable tiles, are popular data products. These products are easily imported into desktop Geographic Information System (GIS) software packages, and require no LIDAR- specific knowledge to analyze. At the other end of the spectrum, non- expert users can access pre- processed network- linked Google Earth images that provide streaming access to 100s of gigabytes of LIDAR- derived imagery for synoptic data browsing. These products enable non- expert users to quickly and seamlessly access large volumes of scientific data. The overwhelming usage of these data products demonstrates the impact of this simple yet novel approach for delivering easy to use lidar data visualizations to Earth scientists, students, and the general public. SERVICE ORIENTED ARCHITECTURES The technical goal of the OpenTopography Facility is to build a modular system that is flexible enough to support the growing needs of the Earth science LIDAR 2

community (figure below). In particular, we strive to host and provide access to datasets as soon as they become available, and also to expose greater application level functionalities to our end- users, e.g. generation of custom Digital Elevation Models (DEMs), and hydrological modeling algorithms for generation of derivative products. Furthermore, we envision enabling direct authenticated access to our back- end functionality through simple Web service APIs, so that users may access our data and compute resources via clients other than Web browsers. Service Oriented Architectures (SOAs) have several features that are beneficial for building cyberinfrastructure solutions for science scientific data systems. Through encapsulation of back- end functionality, SOAs reduce the complexity of the processes that are part of the scientific workflow. Applications and data resources can be wrapped as Web services, exposing only a thin API to clients. The implementation details and the infrastructure back- end can be completely hidden from end- users and scaled up without any change to the service APIs. Web service APIs also enable interoperability across systems and architectures through the use of open standards, such as XML, SOAP and REST. Clients may be written in a variety of languages, and run on disparate platforms. Due to these benefits, we have implemented an SOA to accomplish our goal of making OpenTopography an open and scalable community resource. THE ROLE OF OPEN SOURCE Development of production community cyberinfrastructure systems must heavily leverage other ongoing software development efforts in the open source community. The OpenTopgraphy system is built upon a (mostly) open source software stack, and many of the fundamental processing capabilities provided are dependent on open source libraries. Furthermore, open data and service standards (e.g., OGC) are also necessary to permit system interoperability and data integration. The SOA approach does permit us to run critical commercial applications within the workflow and for which we have yet to find or build an appropriate alternative. CATALOG SERVICES Metadata catalogs serve an important role in enabling discovery of data products and processing services. OpenTopography is in the process of implementing an OGC 3

CSW- compliant catalog that permits discovery of data hosted by OpenTopography and our federated data partners. Such a catalog service would return metadata about data products available from OpenTopography, but would also provide information about the processing capabilities offered by OpenTopography. For example, a bounding box query would inform a user that in addition to Google Earth image overlays, standard DEMs, and point cloud data products are available from OpenTopography, custom DEM products and derived products are also supported by OpenTopography. This catalog service vision can be extended to computational services that can be accessed independently of data. In the case of OpenTopography, various high performance gridding services are offered, as well as other independent services that can produce derived products and visualizations from DEMs. DATA CURATION Any cyberinfrastructure- based data system must provide robust data curation in order to ensure the longevity and reusability of data. OpenTopography is currently working with the University of California, San Diego Research Cyberinfrastructure Initiative to formalize our data curation and archive procedures. As part of this process, all incoming data products will be captured and archived to the San Diego Supercomputer Center cloud- based data storage system (https://cloud.sdsc.edu/), and issued a Digital Object Identifier (DOI) to ensure the data are persistent and citable. Data products dynamically generated by OpenTopography have their provenance captured through dynamically generated metadata that describes the processing algorithms and parameters applied. Ultimately we envision allowing users to issue a secondary DOI that describes their custom- generated data product, and that could be used to cite the specific data used in their scientific analysis. USABILITY AND INTERFACE DESIGN Although there are numerous informatics challenges associated with building the EarthCube, we believe that attention must always be paid to usability, interface design, and utilization of modern web- technologies. In a world where the average internet user can find the answer to nearly any question through a simple search box in the middle of a nearly white page, earth science data systems must strive for similar levels of simplicity and usability. OpenTopography has emphasized this theme throughout the development of our system and make a conscious effort to incorporate modern and familiar web technologies into our user interfaces (see figure below). Although scientific data present significantly more complexity in terms of parameters and query conditions, system builders need not immediately expose this complexity to users. Dynamic, Ajax- based interfaces make it possible to hide advanced query options and processing parameters from novice users, while still allowing advanced users to access the parameters they may wish to manipulate. 4

Further, role- based portal systems that adjust as a function of user type and permissions, and personalized workspaces that mimic the shopping cart environments that proliferate across e- commerce site on the Internet also help to ease usability and make systems familiar and intuitive. Our portal based system serves several thousand users per year, some of which are registered users accessing billions of lidar points, while others are guest users visiting to download Google Earth visualizations and lidar- related sources hosted on the site. In the past year, OpenTopography users have run ~4000 on- demand processing jobs accessing approximately 125 billion points. Beyond access to data, OpenTopography is also a hub to related training information, software tools (http://opentopography.org/tools), and documentation. We support our user community through short course training, responsive customer support, and an active social media presence (OT Blog, Twitter, and Facebook). OpenTopography has also begun to enable social interaction around data, permitting users to share their dynamically generated data products with colleagues and other members of the OpenTopography system. CONCLUSIONS The lessons above stem from the experience of migrating the OpenTopography lidar data access and processing system from an R&D project under the GEON Grid initiative to a development and production (D&P) system that serves a growing and vibrant community of earth science users. OpenTopography s success is an illustration of the power of applying cyberinfrastructure resources in a practical manner to geoscience data management and processing, and the ability of such systems to democratize access to the next generation of earth science data products. 5