Open Persistent Access to al Robert J. Sandusky, UIC University of Illinois at Chicago The Net Partners Update: ONE and the Conservancy December 14, 2009
Outline NSF s Net Program ONE Introduction Motivating Challenges Drives Science ONE Overview Who Scope Virtual Organization Cyberinfrastructure Architecture What s Happened So Far Year 1 Goals
NSF s Net Program Each Net project will: Provide reliable digital preservation, access, integration and support for analysis Adapt to changes in technologies and user needs/expectations Be on the leading edge in research in computer science and cyberinfrastructure Be a component for interoperable data preservation and access in the Net Partners NSF: Provides $20 million for 5 years Expects self-sustaining virtual organizations, viable for many decades 3
ONE Introduction: Environmental Challenges
ONE Overview: Environmental Challenges Smith, Knapp, Collins. In press.
ONE Overview: Environmental Challenges
ONE Overview: Environmental Challenges Health of ecological services affect human well-being Support processes: Nutrient cycling Soil formation Provisioning: Food production Fresh water Wood / fiber Fuel Regulation of: Climate Flood Disease Security: Personal safety Resource access From disaster Basic materials: Adequate livelihood Sufficient food Shelter / goods Health: Clean air / water Strength Social relations: Social cohesion Freedom of choice and action Opportunities to achieve Adapted from Millenium Ecosystem Assessment
ONE Scope: Who PI: William K. Michener, University of New Mexico Co-PIs: Robert Cook, Oak Ridge National Laboratory (ORNL) Michael Frame, U.S. Geological Survey National Biological Information Infrastructure (USGS NBII) Stephanie Hampton, National Center for Ecological Analysis and Synthesis (NCEAS) Kathleen Smith, National Ecological Synthesis Center (NESCent) California Digital Library Co-Investigators from: California Digital Library University of California - Davis University of Southampton CSIRO, Australia Cornell University Ecological Society of America Keystone Center NCSA University of Illinois at Chicago University of Kansas University of Manchester University of Michigan University of Southern CA University of Tennesse, Knoxville University of Edinburgh Utah State University UNM, NCEAS, NESCent, ORNL
ONE Scope: Biological e.g., Gene, Organism, Population, Species, Community, Biome, Ecosystem Environmental e.g., Atmospheric, Chemical, Ecological, Hydrological, Oceanographic, Physical Social e.g., Land use, human population Economic e.g., trade, ecosystem services, resource extraction
ONE Scope: Halpern et al, 2008, A Global Map of Human Impact on Marine Ecosystems. 319, 15 February, 2008, 948.
ONE Scope: Objectives ONE strategic objectives 1. Engage the broadest possible community 2. Create an informatics literate populace 3. Build an extensive data resource 4. Build infrastructure to support the full data life cycle 5. Ensure financial support and sustainability 6. Provide responsive governance and management Universal access to data about life on earth and the environment that sustains it
NSF Engagement, Coordination and Management Net Partners Principal Investigator Leadership Team Director Development & Operations Director Community Engagement & Outreach R&D Core Cyberinfrastructure Team R&D
NSF Engagement, Coordination and Management External Advisory Committee Net Partners Director Development & Operations Principal Investigator Executive Director ONE Office Leadership Team Director Community Engagement & Outreach R&D CI Operations Core CI Team R&D Operations DIUG Education and Outreach Team
NSF Engagement, Coordination and Management External Advisory Committee Net Partners Director Development & Operations Principal Investigator Executive Director ONE Office Leadership Team Director Community Engagement & Outreach R&D CI Operations Core CI Team R&D Operations DIUG Education and Outreach Team Federated security Distributed storage preservation, metadata, and interoperability Scientific workflows integration and semantics Exploration, Visualization, Analysis Usability and assessment Cyberinfrastructure & Research Working Groups
NSF Engagement, Coordination and Management External Advisory Committee Net Partners Director Development & Operations Principal Investigator Executive Director ONE Office Leadership Team Director Community Engagement & Outreach R&D CI Operations Core CI Team R&D Operations DIUG Education and Outreach Team Federated security Distributed storage preservation, metadata, and interoperability Scientific workflows integration and semantics Exploration, Visualization, Analysis Usability and assessment Cyberinfrastructure & Research Working Groups Sociocultural barriers to data sharing and preservation Community engagement and education Citizen science and public outreach Long-term sustainability and governance Exploration, Visualization, Analysis Usability and assessment Engagement & Research Working Groups
Cyberinfrastructure Objectives Support synthesis in earth observation sciences Support full lifecycle of scientific process acquisition and management preservation discovery and access integration analysis and visualization Process management and preservation Evolve to accommodate technology change
ONE CI Design Goals Distributed data management at distributed nodes Replication and caching for preservation and performance Software must provide benefits for scientists today Support and adapt existing community software efforts Evolution of software and standards Emphasize Free and Open Source Software
ONE Cyberinfrastructure Coordinating Member Nodes Nodes retain complete metadata diverse institutions catalog subset serve local of all community data perform basic indexing provide network-wide resources for services managing their data ensure data availability (preservation) provide replication services Flexible, scalable, sustainable network
ONE Deployment
ONE CI Components
Node Design Member nodes Geographically Distributed Nodes observing institutions Libraries contributing capacity; levering repositories Existing disciplinary repositories Government agencies Authoritative repository for many datasets Diversity tolerant (less tightly coordinated) Freedom to try new tools, methods, and leapfrog forward Location of replicated data Coordinating nodes Completely replicated Complete metadata catalog Tightly coordinated, stable service platform Provide centralized services
ONE Service API Federated Identity and Authorization Services Object Management Services Discovery and Usage Services Preservation Services Network Services
Service API for Interoperability Common access methods for different clients Mechanism to map heterogeneous services Provide interface between nodes and service requests Simplicity of construction Lightweight Ease of implementation Implementations are hidden from service consumers
Investigator Toolkit Suite of software tools for researchers Emphasize Free and Open Source, but support commercial General analysis frameworks (e.g., R, MATLAB) Domain-specific tools (e.g., GARP, Phylocom) Organized using scientific workflows (e.g., Kepler, Taverna) Portals (e.g., myexperiemnt, VegBank) Supports the scientific lifecycle management and preservation query and access analysis and visualization Process management and preservation Communication via the Service API
ONE Cyberinfrastructure
ONE Cyberinfrastructure
ONE Cyberinfrastructure
ONE Cyberinfrastructure
ONE Cyberinfrastructure
Where We Are Meeting regularly Core cyberinfrastructure team Input from working group leaders Defining architecture, APIs Beginning first prototype Acquiring CN hardware Community engagement team Leadership team One working group constituted and active Building the organization One director hired, others in the pipeline External advisory board Programmers hired, others in the pipeline Working groups writing charters, identifying members
Year 1 Goals for Cyberinfrastructure Launch 3 Coordinating Nodes: ORNL, UNM, UCSB Launch 3 Member Nodes, drawn from: Dryad at UNC Distributed Active Archive Center at ORNL Knowledge Biocomplexity Water Resource Center (CDL) National Biological Information Infrastructure (NBII) Clearing House and metadata replication Interoperable metadata search and data retrieval Basic logging, health and heartbeat
Discussion Questions, comments? ONE http://www.dataone.org/ E-mail sandusky@uic.edu Acknowledgements Sustainable Digital Preservation and Access Network Partners (Net); Program Solicitation NSF 07-601