Situations and Ontologies: helping geoscientists understand and share the semantics surrounding their computational resources

Similar documents
EarthCube and Cyberinfrastructure for the Earth Sciences: Lessons and Perspective from OpenTopography

A geoinformatics-based approach to the distribution and processing of integrated LiDAR and imagery data to enhance 3D earth systems research

Scientific Workflow Tools. Daniel Crawl and Ilkay Altintas San Diego Supercomputer Center UC San Diego

Knowledge-based Grids

The International Journal of Digital Curation Issue 1, Volume

KNOWLEDGE MANAGEMENT VIA DEVELOPMENT IN ACCOUNTING: THE CASE OF THE PROFIT AND LOSS ACCOUNT

ACCI Recommendations on Long Term Cyberinfrastructure Issues: Building Future Development

Cheshire 3 Framework White Paper: Implementing Support for Digital Repositories in a Data Grid Environment

Google indexed 3,3 billion of pages. Google s index contains 8,1 billion of websites

Jeffery S. Horsburgh. Utah Water Research Laboratory Utah State University

The MEG Metadata Schemas Registry Schemas and Ontologies: building a Semantic Infrastructure for GRIDs and digital libraries Edinburgh, 16 May 2003

DATA MANAGEMENT SYSTEMS FOR SCIENTIFIC APPLICATIONS

Earthdata Cloud Analytics Project

A Three Tier Architecture for LiDAR Interpolation and Analysis

EFFICIENT INTEGRATION OF SEMANTIC TECHNOLOGIES FOR PROFESSIONAL IMAGE ANNOTATION AND SEARCH

Things to consider when using Semantics in your Information Management strategy. Toby Conrad Smartlogic

Development of an Ontology-Based Portal for Digital Archive Services

Implementing Trusted Digital Repositories

An Introduction to the Semantic Web. Jeff Heflin Lehigh University

How to use Water Data to Produce Knowledge: Data Sharing with the CUAHSI Water Data Center

ACCELERATE YOUR SHAREPOINT ADOPTION AND ROI WITH CONTENT INTELLIGENCE

Efficient Querying of Web Services Using Ontologies

Enhanced retrieval using semantic technologies:

Ontology Servers and Metadata Vocabulary Repositories

Extension and integration of i* models with ontologies

Enhanced Access to High-Resolution LiDAR Topography through Cyberinfrastructure-Based Data Distribution and Processing

Introduction to Grid Computing

University of Bath. Publication date: Document Version Publisher's PDF, also known as Version of record. Link to publication

Opus: University of Bath Online Publication Store

Case Study: CyberSKA - A Collaborative Platform for Data Intensive Radio Astronomy

Paving the Rocky Road Toward Open and FAIR in the Field Sciences

Semantic Web Technology Evaluation Ontology (SWETO): A test bed for evaluating tools and benchmarking semantic applications

DataONE: Open Persistent Access to Earth Observational Data

Conducting a Self-Assessment of a Long-Term Archive for Interdisciplinary Scientific Data as a Trustworthy Digital Repository

Create Once Use Many Times

Chapter 3 Research Method

H1 Spring B. Programmers need to learn the SOAP schema so as to offer and use Web services.

Reducing Consumer Uncertainty Towards a Vocabulary for User-centric Geospatial Metadata

Scientific Data Curation and the Grid

EGEE and Interoperation

N. Marusov, I. Semenov

Semantic Web Technology Evaluation Ontology (SWETO): A Test Bed for Evaluating Tools and Benchmarking Applications

The Canadian CyberSKA Project

INTELLIGENT SYSTEMS OVER THE INTERNET

Semantic Web. Ontology Pattern. Gerd Gröner, Matthias Thimm. Institute for Web Science and Technologies (WeST) University of Koblenz-Landau

It Is What It Does: The Pragmatics of Ontology for Knowledge Sharing

Business Rules in the Semantic Web, are there any or are they different?

National Centre for Text Mining NaCTeM. e-science and data mining workshop

CEN MetaLex. Facilitating Interchange in E- Government. Alexander Boer

Ontologies and The Earth System Grid

Approach for Mapping Ontologies to Relational Databases

Semantics and Ontologies For EarthCube

CEN/ISSS WS/eCAT. Terminology for ecatalogues and Product Description and Classification

Proposed Revisions to ebxml Technical. Architecture Specification v1.04

UCLA RESEARCH INFORMATICS STRATEGIC PLAN Taking Action June, 2013

Engaging and Connecting Faculty:

Cyberinfrastructure Component: Goals, Strategies, Progress and Plans

Open Ontology Repository Initiative

Library Technology Conference, March 20, 2014 St. Paul, MN

: Semantic Web (2013 Fall)

Interoperability ~ An Introduction

Using ESML in a Semantic Web Approach for Improved Earth Science Data Usability

Semantic Web. Ontology Engineering and Evaluation. Morteza Amini. Sharif University of Technology Fall 95-96

Distributed Data Management with Storage Resource Broker in the UK

Overview of Web Mining Techniques and its Application towards Web

Standards for classifying services and related information in the public sector

IRODS: the Integrated Rule- Oriented Data-Management System

Science-as-a-Service

Developing the ERS Collaboration Framework

Ontology Development. Farid Naimi

SeMFIS: A Tool for Managing Semantic Conceptual Models

Proposed Revisions to ebxml Technical Architecture Specification v ebxml Business Process Project Team

Interoperability Working Group: Driving Collaboration

POMELo: A PML Online Editor

ICT-SHOK Project Proposal: PROFI

EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal

The National Cancer Institute's Thésaurus and Ontology

Reducing Consumer Uncertainty

PREPARED BY CROSS-DOMAIN INTEROPERABILITY TEST BED GROUP

Semantic Technologies and CDISC Standards. Frederik Malfait, Information Architect, IMOS Consulting Scott Bahlavooni, Independent

Smart Metadata and the Archives of the Future. Sue McKemmish Joanne Evans Anne Gilliland-Swetland Nadav Rouche Richard Marciano Hans Hofman

Ontologies SKOS. COMP62342 Sean Bechhofer

Extracting Ontologies from Standards: Experiences and Issues

is easing the creation of new ontologies by promoting the reuse of existing ones and automating, as much as possible, the entire ontology

Cyberinfrastructure Framework for 21st Century Science & Engineering (CIF21)

Standard SOA Reference Models and Architectures

Practical experiences with Research Quality Exercises DAG ITWG - August 18, David Groenewegen ARROW Project Manager

Rolling Deck to Repository: Opportunities for US-EU Collaboration

Meta-Bridge: A Development of Metadata Information Infrastructure in Japan

DATA STEWARDSHIP BODY OF KNOWLEDGE (DSBOK)

Linked Data: Fast, low cost semantic interoperability for health care?

Semantics in the Financial Industry: the Financial Industry Business Ontology

Sustainable Governance for Long-Term Stewardship of Earth Science Data

SEMANTIC INTEGRATION IN GEOSCIENCES

Semantic Web. Ontology Engineering and Evaluation. Morteza Amini. Sharif University of Technology Fall 93-94

OMG Specifications for Enterprise Interoperability

EUDAT. A European Collaborative Data Infrastructure. Daan Broeder The Language Archive MPI for Psycholinguistics CLARIN, DASISH, EUDAT

SC32 WG2 Metadata Standards Tutorial

Wondering about either OWL ontologies or SKOS vocabularies? You need both!

The MUSING Approach for Combining XBRL and Semantic Web Data. ~ Position Paper ~

Transcription:

Situations and Ontologies: helping geoscientists understand and share the semantics surrounding their computational resources Mark Gahegan GeoVISTA Center, Department of Geography, The Pennsylvania State University, USA

The SemantiWebCyberGrid

Three NSF CyberInfrastructure (CI) projects Three NSF CyberInfrastructure (CI) projects The Geosciences Network (GEON): www.geongrid.org Human Environment Regional Observatories (HERO): www.hero.psu.edu Learning Activities in Digital Libraries: www.dialogplus.org

CI Goals CI Goals How can communities of geoscientists do better science (more effective, more efficient) by sharing their resources: computing power, data, tools, results? Making resources available is not the same as making them useful to others Litmus tests: Contributing to cyber-infrastructure must become an integral part of the way scientists and educators work Will future generations of scientists be able to follow our work?

BIG Problems Systems, scientific and philosophic, come and go. Each method of limited understanding is at length exhausted. In its prime each system is a triumphant success: in its decay it is an obstructive nuisance. Alfred North Whitehead adventures of ideas Our current systems for doing geoscience are Closed Private Narrow in scope We duplicate research efforts, equipment needs Difficult to share outcomes with other researchers Science funding agencies do not like this

Crux of the problem Crux of the problem Within the geosciences, the meaning of geoinformation is constructed, shaped and changed by the interaction of people, data and systems But this interaction is not captured, and information becomes separated from situations by which it is given meaning.

EXAMPLE: Map construction and semantic conflict C: intra-geologist clustering C: inter-geologist similarity Distance MMD (me 25000 20000 15000 10000 5000 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Time (weeks) Distance MMD (mean) (Davenport and others, 1996) 20000 15000 10000 5000 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Time (weeks)

Developments in computer science

Background: layers of Cyber-infrastructure Background: layers of Cyber-infrastructure Grid Physical Implementation, TERAGRID + small local nodes (GEON) Peer-to-Peer (P2P) Computing, software technology that enables networked computers to communicate (exchange information) without a common operating environment. Registry, authentication, access control, monitoring, replication, distributed filesystem, collection management (SRB), job submission Web services mechanisms are integrated into the Grid model through the Open Grid Services Architecture (OGSA), data integration, visualization. Semantic Services, describing and searching for web content using formalized semantics: ontologies, provenance, workflow management Collaborative Knowledge Environments, Knowledge portals, asynchronous discussions, video conferencing,, colaboratories

Towards a knowledge collaboratory Towards a knowledge collaboratory

The GEON GRID: physical implementation Geological Survey of Canada Chronos Livermore KGS USGS ESRI CUAHSI PoP node Partner Projects Compute cluster Data Cluster Partner services 1TF cluster Knowledge CyberInfrastructure 2/23/2005

GRID computing GRID computing

How can we organize our knowledge resources?

Literals, popularity & corporate sponsorship is how we currently organize resources, search the Internet! The Semantic Web describing and searching for web content using formalized semantics (controlled vocabularies, taxonomies, ontologies ) Avoids some language problems, ambiguities

Top-down knowledge representation Top-down knowledge representation Ways to organize formal knowledge: Controlled vocabularies Database schema (relational, XML, ) Conceptual schema (ER-diagrams, UML, ) Thesauri (synonyms, broader term/narrower term) Taxonomies (e.g. geology, soils, landcover/landuse) Ontologies, e.g., in [Description] Logic (OWL) constrains possible interpretation of terms What is an ontology? An ontology specifies a theory (a set of models) by defining and relating concepts within a domain of interest Using formal logic

Why ontologies? (Noy and McGuinness) Why ontologies? (Noy and McGuinness) To share common understanding of the structure of information among people or software agents To enable reuse of domain knowledge To make domain assumptions explicit To separate domain knowledge from operational knowledge To analyze domain knowledge

Sample EarthRealm Ontology from NASA Sample EarthRealm Ontology from NASA </owl:class> <owl:class rdf:id="marsh"> <rdfs:subclassof rdf:resource="#coastalregion"/> <rdfs:subclassof rdf:resource="#wetlandregion"/> </owl:class> <owl:class rdf:id="wetlandregion"> <rdfs:subclassof> <owl:restriction> <owl:onproperty rdf:resource="http://sweet.jpl.nasa.gov/ontology/space.owl#ispartof"/> <owl:allvaluesfrom rdf:resource="#landwatersurfacelayer"/> </owl:restriction> </rdfs:subclassof> <rdfs:subclassof rdf:resource="#landregion"/> </owl:class>

Rock Taxonomy Rock Taxonomy Geological taxonomy converted to an ontology Gathered from experts during a specially convened workshop Formalizes relationships between concepts GEON PIs: Randy Keller (UTEP), Bertram Ludaescher, Kai Lin, Dogan Seber (SDSC)

An alternative rock taxonomy! An alternative rock taxonomy! Rock music taxonomy converted to a concept map Gathered automatically from consumer purchasing logs Assumes relationships between concepts

Why Not Ontologies! Why Not Ontologies! Top down knowledge (ontologies) only get you so far other kinds of knowledge are also very important & useful Experiences Use-cases (situations surrounding the use of resources) Social networks Most current ontologies are static resources Our understanding of the Earth is dynamic & continually evolving Unless ontologies are community-owned, dynamic resources they will soon become part of the problem, not part of the solution What happens to all the millions of computational resources that predate ontologies? The cost of retro-fitting ontologies is prohibitive.

Knowledge Soup Sowa, 2002 Knowledge Soup Sowa, 2002 According to Heraclitus, panta rhei everything is in flux. But what gives that flux its form is the logos the words or signs that enable us to perceive patterns in the flux, remember them, talk about them, and take action upon them even while we ourselves are part of the flux we are acting in and on.

Earth Science Multi-site knowledge soup Sowa, 2002

Situations: data provenance Situations: data provenance Creation Application Represented by Who did it? Who should use it? Collections of people Where was it made? When was it made? How was it made? Why was it made? Where does it apply? When does it apply? How should it be used? Why should it be used? Collections of sites / scales Collections of temporal intervals Collections of methods and data Collections of research questions, motivations, theories

Codex: A nexus of knowledge structures Codex: A nexus of knowledge structures

implemented as a web portal implemented as a web portal http://hero.geog.psu.edu/codex

Data provenance: Learning from use-cases Data provenance: Learning from use-cases Who created that resource? When was it created? How often has it been used? Has it been modified recently? Who has used it? What has it been used with? Such questions add a rich context by capturing situations surrounding resource usage

Logging resource usage data: capturing situations of use logged usage data (Oracle, MySQL)

Mining association rules from use-case logs Mining association rules from use-case logs Association rules are mined from user action logs (uses the WEKA (Waikato Environment for Knowledge Analysis) API Apriori algorithm (Agrawal and Srikant, 1994) Tools added for data preprocessing and classifying: attribute selector: allows user to select a subset of data attributes. data filters: allows user to define filters to convert String, Time, Numeric data in any attribute column to nominal data for association mining.

What can we do with these knowledge resources?

Examples of knowledge use in analysis tasks Examples of knowledge use in analysis tasks 1. Situating geoscience resources 2. Ontology based map integration 3. Packaging and navigating learning activities 4. Exploring and constructing categories 5. Automatically building complex workflows

1. Situating resources (GEON: Randy Keller s gravity map)

Situated by data and methods Situated by data and methods

and people and people Bill Pike (PSU): GEON researcher

Can we get there from here?

Summary What is possible, what can be represented? Ontologies, use cases & nexus of relationships show what can be represented What is desirable, what is useful? Ontology-based schema interoperation, adding value to (contextualizing) resources, automated workflow creation and integration of data, methods and concepts in analysis show what we can do How can we facilitate adoption? Web Portals and GRID networks offer a means of adoption by providing accessible services that communities of scientists can use BUT

Many challenges Many challenges Technical Conceptual Sociological What needs to change? Ongoing NSF funding for cyberinfrastructure Participation and adoption by science communities (risk, resistance, change of work habits, embedding of knowledge tools in current systems) Recognition that contributing to cyberinfrastructure is a valid and worthwhile science outcome (just like publishing papers)

Current projects Current projects Add more perspectives onto resources (e.g. educational, review). Improve transition between semi-formal concept maps (provided by domain scientists) and formal (computable) ontologies that can be used to semantically integrate information. Comparing knowledge schema: How much do we agree? (Long term) Can ontological knowledge be inferred through situational data such as usehistories?

Credits GeoVISTA Center, Penn State (HERO, GEON) Bill Pike, James O Brien, Junyan Luo, Brandi Nagle, Xiping Dai, Gary Sheppard, Sachin Oswal San Diego Supercomputer Center (GEON) Chaitan Baru, Kai Lin Southampton, UK (DialogPLUS) Chris Bailey This research is funded by: NSF BCS-9978052, (HERO) NSF ITR (EAR)-0225673 (GEON) NSF/JISC Digital Libraries for Education (DialogPLUS) NSF ITR (BCS)-0219025 NGA-NURI Program

The End Questions?