Building on Existing Communities: the Virtual Astronomical Observatory (and NIST)

Similar documents
VIRTUAL OBSERVATORY TECHNOLOGIES

Technology for the Virtual Observatory. The Virtual Observatory. Toward a new astronomy. Toward a new astronomy

The Virtual Observatory and the IVOA

THE NATIONAL DATA SERVICE(S) & NDS CONSORTIUM A Call to Action for Accelerating Discovery Through Data Services we can Build Ed Seidel

Data Management at NIST

Designing the Future Data Management Environment for [Radio] Astronomy. JJ Kavelaars Canadian Astronomy Data Centre

The Materials Data Facility

Data Centres in the Virtual Observatory Age

Indiana University Research Technology and the Research Data Alliance

ESPAS: How it fits and benefits

DCC Disciplinary Metadata

THE EUCLID ARCHIVE SYSTEM: A DATA-CENTRIC APPROACH TO BIG DATA

The IPAC Research Archives. Steve Groom IPAC / Caltech

A VOSpace deployment: interoperability and integration in big infrastructure

VAPE Virtual observatory Aided Publishing for Education

I data set della ricerca ed il progetto EUDAT

EUDAT- Towards a Global Collaborative Data Infrastructure

Data Replication: Automated move and copy of data. PRACE Advanced Training Course on Data Staging and Data Movement Helsinki, September 10 th 2013

Cyberinfrastructure Framework for 21st Century Science & Engineering (CIF21)

LASDA: an archiving system for managing and sharing large scientific data

IVOA and European VO Efforts Status and Plans

Mitigating Risk of Data Loss in Preservation Environments

Design and Implementation of the Japanese Virtual Observatory (JVO) system Yuji SHIRASAKI National Astronomical Observatory of Japan

Introduction to Grid Computing

Global Data Sharing The Research Data Alliance

The Research Data Alliance Creating the culture and technology for an international data infrastructure

EUDAT. A European Collaborative Data Infrastructure. Daan Broeder The Language Archive MPI for Psycholinguistics CLARIN, DASISH, EUDAT

Astronomy Dataverse: enabling astronomer data publishing.

A VO-friendly, Community-based Authorization Framework

Town Hall Meeting on the National and International Data Infrastructure. Jim Warren, NIST Lisa Friedersdorf, NNCO

Data publication and discovery with Globus

Giovanni Lamanna LAPP - Laboratoire d'annecy-le-vieux de Physique des Particules, Université de Savoie, CNRS/IN2P3, Annecy-le-Vieux, France

DSpace Fedora. Eprints Greenstone. Handle System

Scientific Data Curation and the Grid

The Smithsonian/NASA Astrophysics Data System

Institutional Repository using DSpace. Yatrik Patel Scientist D (CS)

Surveying the Digital Library Landscape

Copyright is owned by the Author of the thesis. Permission is given for a copy to be downloaded by an individual for the purpose of research and

Knowledge-based Grids

BPMN Processes for machine-actionable DMPs

Using VIVO at the Laboratory for Atmospheric and Space Physics (LASP)

EUDAT. Towards a pan-european Collaborative Data Infrastructure

Digital repositories as research infrastructure: a UK perspective

THE EUCLID ARCHIVE SYSTEM: A DATA-CENTRIC APPROACH TO BIG DATA

arxiv: v1 [astro-ph.im] 14 Oct 2016

Mercè Crosas, Ph.D. Chief Data Science and Technology Officer Institute for Quantitative Social Science (IQSS) Harvard

SIA V2 Analysis, Scope and Concepts. Alliance. Author(s): D. Tody (ed.), F. Bonnarel, M. Dolensky, J. Salgado (others TBA in next version)

EUDAT. Towards a pan-european Collaborative Data Infrastructure

Research Data Management & Preservation: A Library Perspective

EUDAT Data Services & Tools for Researchers and Communities. Dr. Per Öster Director, Research Infrastructures CSC IT Center for Science Ltd

DataONE Enabling Cyberinfrastructure for the Biological, Environmental and Earth Sciences

Paving the Rocky Road Toward Open and FAIR in the Field Sciences

Dataverse and DataTags

Storage Virtualization. Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan

EUDAT. Towards a pan-european Collaborative Data Infrastructure

Heliospheric Physics Data and Computing Working Group

Data management and discovery

Horizon 2020 and the Open Research Data pilot. Sarah Jones Digital Curation Centre, Glasgow

EUDAT Common data infrastructure

EUDAT Training 2 nd EUDAT Conference, Rome October 28 th Introduction, Vision and Architecture. Giuseppe Fiameni CINECA Rob Baxter EPCC EUDAT members

Science Panel Discussion presentation: "A Data Sharing Story"

Science-as-a-Service

Slide 1 & 2 Technical issues Slide 3 Technical expertise (continued...)

DATA SHARING FOR BETTER SCIENCE

National Materials Data Initiatives

CLARIN s central infrastructure. Dieter Van Uytvanck CLARIN-PLUS Tools & Services Workshop 2 June 2016 Vienna

The Data exacell DXC. J. Ray Scott DXC PI May 17, 2016

The Portal Aspect of the LSST Science Platform. Gregory Dubois-Felsmann Caltech/IPAC. LSST2017 August 16, 2017

Research Elsevier

Exploiting Virtual Observatory and Information Technology: Techniques for Astronomy

Science 2.0 VU Big Science, e-science and E- Infrastructures + Bibliometric Network Analysis

Grid Computing. MCSN - N. Tonellotto - Distributed Enabling Platforms

SEAD Data Services. Jim Best Practices in Data Infrastructure Workshop. Cooperative agreement #OCI

Update on Dataverse Dryad-Dataverse Community Meeting. Mercè Crosas, Elizabeth Quigley & Eleni Castro. Data Science > IQSS > Harvard University

Information Technology Infrastructure Committee (ITIC)

DIGITAL STEWARDSHIP SUPPLEMENTARY INFORMATION FORM

Cheshire 3 Framework White Paper: Implementing Support for Digital Repositories in a Data Grid Environment

The Role of Repositories and Journals in the Astronomy Research Lifecycle

NASA and International Open Standards and the Future of Space Weather Studies

Lessons learnt from building the International Virtual Observatory Alliance. Françoise Genova

Demos: DMP Assistant and Dataverse

Research on the Interoperability Architecture of the Digital Library Grid

EUDAT. Towards a pan-european Collaborative Data Infrastructure. Damien Lecarpentier CSC-IT Center for Science, Finland EUDAT User Forum, Barcelona

Richard Marciano Alexandra Chassanoff David Pcolar Bing Zhu Chien-Yi Hu. March 24, 2010

EUDAT & AAI. Daan Broeder MPI for Psycholinguistics

Usage of the Astro Runtime

Building a Materials Data Facility (MDF)

Case Study: CyberSKA - A Collaborative Platform for Data Intensive Radio Astronomy

Software + Services for Data Storage, Management, Discovery, and Re-Use

The Computation and Data Needs of Canadian Astronomy

DOI for Astronomical Data Centers: ESO. Hainaut, Bordelon, Grothkopf, Fourniol, Micol, Retzlaff, Sterzik, Stoehr [ESO] Enke, Riebe [AIP]

The Ohio State University's Knowledge Bank: An Institutional Repository in Practice

This presentation is on issues that span most every digitization project.

Digital Preservation at NARA

USE CASE STUDY. Connecting Data Through Mission. Department of Transportation (DOT) A Product of the Federal CIO Council Innovation Committee

Research Data Repository Interoperability Primer

The Materials Genome Initiative and the Data Infrastructure

Euclid Archive Science Archive System

EUDAT & SeaDataCloud

irods - An Overview Jason Executive Director, irods Consortium CS Department of Computer Science, AGH Kraków, Poland

Transcription:

Building on Existing Communities: the Virtual Astronomical Observatory (and NIST) Robert Hanisch Space Telescope Science Institute Director, Virtual Astronomical Observatory

Data in astronomy 2 ~70 major data centers and observatories with substantial on-line data holdings ~10,000 data resources (catalogs, surveys, archives) Data centers host from a few to ~100s TB each, currently at least 2 PB total Current growth rate ~0.5 PB/yr, increasing Current request rate ~1 PB/yr Future surveys will increase data rates to PB/day For LSST, the telescope is a peripheral to the data system (T. Tyson) How do astronomers navigate through all of this data?

The Virtual Observatory 3 The VO is a data discovery, access, and integration facility Images, spectra, time series Catalogs, databases Transient event notices Software and services Application inter-communication Distributed computing authentication, authorization, process management International coordination collaboration IVOA W3C) and through (a la

Virtual Observatory capabilities 4 Data exchange / interoperability / multi-λ (co-observing) Data Access Layer (SIAP, SSAP / time series) Query and cross-match across distributed databases Cone Search, Table Access Protocol Remote (but managed) access to centralized computing and data storage resources VOSpace, Single-Sign-On (OpenID), SciDrive Transient event notification, scalable to 10 6 messages/night VOEvent Data mining, characterization, classification, statistical analysis VOStat, Data Mining and Exploration toolkit

VO architecture 5

VO architecture 6

VO architecture 7

Key to discovery: Registry 8 Used to discover and locate resources data and services that can be used in a VO application Resource: anything that is describable and identifiable. Besides data and services: organizations, projects, software, standards Registry: a list of resource descriptions Expressed as structured metadata in XML to enable automated processing and searching Metadata based on Dublin Core

Registry framework 9 Local Publishing Registry harvest (pull) OAI/PMH Full Searchable Registry replicate Full Searchable Registry Data Centers Local Publishing Registry search queries Users, applications

Data discovery 10

Data discovery 11

SciDrive: astro-centric cloud storage 12 Controlled data sharing Single sign-on Deployable as virtual machine Automatic Metadata Extraction Extract tabular data from: CSV FITS TIFF Excel Extract metadata from: FITS Image files (TIFF, JPG) Automatically upload tables into relational databases: CasJobs/MyDB SQLShare

The VO concept elsewhere 13 Space Science Virtual Heliophysics Observatory (HELIO) Virtual Radiation Belt Observatory (ViRBO) Virtual Space Physics Observatory (VSPO) Virtual Magnetospheric Observatory (VMO) Virtual Ionosphere Thermosphere Mesosphere Observatory (VITMO) Virtual Solar-Terrestrial Observatory (VSTO) Virtual Sun/Earth Observatory (VSEO) Virtual Solar Observatory Planetary Science Virtual Observatory Deep Carbon Virtual Observatory Virtual Brain Observatory

Data management at 14 I move to NIST 7/28/2014 as Director, Office of Data and Informatics, Material Measurement Laboratory Materials science, chemistry, biology Materials Genome Initiative Foster a culture of data management, curation, re-use in a benchscientist / PI-dominated organization having a strong record of providing gold standard data Inward-looking challenges Tools, support, advice, common platforms, solution broker Big data, lots of small/medium data Outward-looking challenges Service directory Modern web interfaces, APIs, better service integration Get better sense of what communities want from NIST Define standards, standard practices Collaboration: other government agencies, universities, domain repositories

15 http://www.nist.gov/mgi

NDS and domain repositories 16 Domain repositories are discipline-specific Various business models in use; long-term sustainability is a major challenge* Potential NDS roles Customizable data management and curation tools built on a common substrate Access to cloud-like storage but at non-commercial rates A directory of ontology-building and metadata management tools A directory of domain repositories Accreditation services Advice, referral services, genius bar * Sustaining Domain Repositories for Digital Data: A White Paper, C. Ember & R. Hanisch, eds. http://datacommunity.icpsr.umich.edu/sites/default/files/whitepaper_icpsr_s DRDD_121113.pdf

Technologies/standards to build on 17 Just use the VO standards! OK, seriously NIH syndrome Much could be re-used in terms of architecture Generic, collection-level metadata Cross-talk with Research Data Alliance (ANDS, EUDAT) Data Citation WG Data Description Registry Interoperability WG Data Type Registries WG Domain Repositories IG Long Tail of Research Data IG Metadata IG Metadata Standards Directory WG Preservation e-infrastructure WG and others Dataverse, Dryad, irods, DSpace, etc.

Lessons learned re/ federation 18 It takes more time than you think Community consensus requires buy-in early and throughout Top-down imposition of standards likely to fail Balance requirements coming from a researchoriented community with innovation in IT Marketing is very important Managing expectations Build it, and they might come Coordination at the international level is essential But takes time and effort Data models sometimes seem obvious, more often not Metadata collection and curation are eternal but essential tasks

Lessons learned re/ federation 19 For example, the Cancer Biomedical Informatics Grid (cabig) [$350M] goal was to provide shared computing for biomedical research and to develop software tools and standard formats for information exchange. The program grew too rapidly without careful prioritization or a cost-effective business model. software is overdesigned, difficult to use, or lacking support and documentation. "The failure to link the mission objectives to the technology shows how important user acceptance and buy-in can be. (M. Biddick, Fusion PPT) J. Foley, InformationWeek, 4/8/2011 http://www.informationweek.com/architecture/report-blasts-problem-plaguedcancer-research-grid/d/d-id/1097068?