Situations and Ontologies: helping geoscientists understand and share the semantics surrounding their computational resources Mark Gahegan GeoVISTA Center, Department of Geography, The Pennsylvania State University, USA
The SemantiWebCyberGrid
Three NSF CyberInfrastructure (CI) projects Three NSF CyberInfrastructure (CI) projects The Geosciences Network (GEON): www.geongrid.org Human Environment Regional Observatories (HERO): www.hero.psu.edu Learning Activities in Digital Libraries: www.dialogplus.org
CI Goals CI Goals How can communities of geoscientists do better science (more effective, more efficient) by sharing their resources: computing power, data, tools, results? Making resources available is not the same as making them useful to others Litmus tests: Contributing to cyber-infrastructure must become an integral part of the way scientists and educators work Will future generations of scientists be able to follow our work?
BIG Problems Systems, scientific and philosophic, come and go. Each method of limited understanding is at length exhausted. In its prime each system is a triumphant success: in its decay it is an obstructive nuisance. Alfred North Whitehead adventures of ideas Our current systems for doing geoscience are Closed Private Narrow in scope We duplicate research efforts, equipment needs Difficult to share outcomes with other researchers Science funding agencies do not like this
Crux of the problem Crux of the problem Within the geosciences, the meaning of geoinformation is constructed, shaped and changed by the interaction of people, data and systems But this interaction is not captured, and information becomes separated from situations by which it is given meaning.
EXAMPLE: Map construction and semantic conflict C: intra-geologist clustering C: inter-geologist similarity Distance MMD (me 25000 20000 15000 10000 5000 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Time (weeks) Distance MMD (mean) (Davenport and others, 1996) 20000 15000 10000 5000 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Time (weeks)
Developments in computer science
Background: layers of Cyber-infrastructure Background: layers of Cyber-infrastructure Grid Physical Implementation, TERAGRID + small local nodes (GEON) Peer-to-Peer (P2P) Computing, software technology that enables networked computers to communicate (exchange information) without a common operating environment. Registry, authentication, access control, monitoring, replication, distributed filesystem, collection management (SRB), job submission Web services mechanisms are integrated into the Grid model through the Open Grid Services Architecture (OGSA), data integration, visualization. Semantic Services, describing and searching for web content using formalized semantics: ontologies, provenance, workflow management Collaborative Knowledge Environments, Knowledge portals, asynchronous discussions, video conferencing,, colaboratories
Towards a knowledge collaboratory Towards a knowledge collaboratory
The GEON GRID: physical implementation Geological Survey of Canada Chronos Livermore KGS USGS ESRI CUAHSI PoP node Partner Projects Compute cluster Data Cluster Partner services 1TF cluster Knowledge CyberInfrastructure 2/23/2005
GRID computing GRID computing
How can we organize our knowledge resources?
Literals, popularity & corporate sponsorship is how we currently organize resources, search the Internet! The Semantic Web describing and searching for web content using formalized semantics (controlled vocabularies, taxonomies, ontologies ) Avoids some language problems, ambiguities
Top-down knowledge representation Top-down knowledge representation Ways to organize formal knowledge: Controlled vocabularies Database schema (relational, XML, ) Conceptual schema (ER-diagrams, UML, ) Thesauri (synonyms, broader term/narrower term) Taxonomies (e.g. geology, soils, landcover/landuse) Ontologies, e.g., in [Description] Logic (OWL) constrains possible interpretation of terms What is an ontology? An ontology specifies a theory (a set of models) by defining and relating concepts within a domain of interest Using formal logic
Why ontologies? (Noy and McGuinness) Why ontologies? (Noy and McGuinness) To share common understanding of the structure of information among people or software agents To enable reuse of domain knowledge To make domain assumptions explicit To separate domain knowledge from operational knowledge To analyze domain knowledge
Sample EarthRealm Ontology from NASA Sample EarthRealm Ontology from NASA </owl:class> <owl:class rdf:id="marsh"> <rdfs:subclassof rdf:resource="#coastalregion"/> <rdfs:subclassof rdf:resource="#wetlandregion"/> </owl:class> <owl:class rdf:id="wetlandregion"> <rdfs:subclassof> <owl:restriction> <owl:onproperty rdf:resource="http://sweet.jpl.nasa.gov/ontology/space.owl#ispartof"/> <owl:allvaluesfrom rdf:resource="#landwatersurfacelayer"/> </owl:restriction> </rdfs:subclassof> <rdfs:subclassof rdf:resource="#landregion"/> </owl:class>
Rock Taxonomy Rock Taxonomy Geological taxonomy converted to an ontology Gathered from experts during a specially convened workshop Formalizes relationships between concepts GEON PIs: Randy Keller (UTEP), Bertram Ludaescher, Kai Lin, Dogan Seber (SDSC)
An alternative rock taxonomy! An alternative rock taxonomy! Rock music taxonomy converted to a concept map Gathered automatically from consumer purchasing logs Assumes relationships between concepts
Why Not Ontologies! Why Not Ontologies! Top down knowledge (ontologies) only get you so far other kinds of knowledge are also very important & useful Experiences Use-cases (situations surrounding the use of resources) Social networks Most current ontologies are static resources Our understanding of the Earth is dynamic & continually evolving Unless ontologies are community-owned, dynamic resources they will soon become part of the problem, not part of the solution What happens to all the millions of computational resources that predate ontologies? The cost of retro-fitting ontologies is prohibitive.
Knowledge Soup Sowa, 2002 Knowledge Soup Sowa, 2002 According to Heraclitus, panta rhei everything is in flux. But what gives that flux its form is the logos the words or signs that enable us to perceive patterns in the flux, remember them, talk about them, and take action upon them even while we ourselves are part of the flux we are acting in and on.
Earth Science Multi-site knowledge soup Sowa, 2002
Situations: data provenance Situations: data provenance Creation Application Represented by Who did it? Who should use it? Collections of people Where was it made? When was it made? How was it made? Why was it made? Where does it apply? When does it apply? How should it be used? Why should it be used? Collections of sites / scales Collections of temporal intervals Collections of methods and data Collections of research questions, motivations, theories
Codex: A nexus of knowledge structures Codex: A nexus of knowledge structures
implemented as a web portal implemented as a web portal http://hero.geog.psu.edu/codex
Data provenance: Learning from use-cases Data provenance: Learning from use-cases Who created that resource? When was it created? How often has it been used? Has it been modified recently? Who has used it? What has it been used with? Such questions add a rich context by capturing situations surrounding resource usage
Logging resource usage data: capturing situations of use logged usage data (Oracle, MySQL)
Mining association rules from use-case logs Mining association rules from use-case logs Association rules are mined from user action logs (uses the WEKA (Waikato Environment for Knowledge Analysis) API Apriori algorithm (Agrawal and Srikant, 1994) Tools added for data preprocessing and classifying: attribute selector: allows user to select a subset of data attributes. data filters: allows user to define filters to convert String, Time, Numeric data in any attribute column to nominal data for association mining.
What can we do with these knowledge resources?
Examples of knowledge use in analysis tasks Examples of knowledge use in analysis tasks 1. Situating geoscience resources 2. Ontology based map integration 3. Packaging and navigating learning activities 4. Exploring and constructing categories 5. Automatically building complex workflows
1. Situating resources (GEON: Randy Keller s gravity map)
Situated by data and methods Situated by data and methods
and people and people Bill Pike (PSU): GEON researcher
Can we get there from here?
Summary What is possible, what can be represented? Ontologies, use cases & nexus of relationships show what can be represented What is desirable, what is useful? Ontology-based schema interoperation, adding value to (contextualizing) resources, automated workflow creation and integration of data, methods and concepts in analysis show what we can do How can we facilitate adoption? Web Portals and GRID networks offer a means of adoption by providing accessible services that communities of scientists can use BUT
Many challenges Many challenges Technical Conceptual Sociological What needs to change? Ongoing NSF funding for cyberinfrastructure Participation and adoption by science communities (risk, resistance, change of work habits, embedding of knowledge tools in current systems) Recognition that contributing to cyberinfrastructure is a valid and worthwhile science outcome (just like publishing papers)
Current projects Current projects Add more perspectives onto resources (e.g. educational, review). Improve transition between semi-formal concept maps (provided by domain scientists) and formal (computable) ontologies that can be used to semantically integrate information. Comparing knowledge schema: How much do we agree? (Long term) Can ontological knowledge be inferred through situational data such as usehistories?
Credits GeoVISTA Center, Penn State (HERO, GEON) Bill Pike, James O Brien, Junyan Luo, Brandi Nagle, Xiping Dai, Gary Sheppard, Sachin Oswal San Diego Supercomputer Center (GEON) Chaitan Baru, Kai Lin Southampton, UK (DialogPLUS) Chris Bailey This research is funded by: NSF BCS-9978052, (HERO) NSF ITR (EAR)-0225673 (GEON) NSF/JISC Digital Libraries for Education (DialogPLUS) NSF ITR (BCS)-0219025 NGA-NURI Program
The End Questions?