Semantic Annotation, Search and Analysis Borislav Popov, Ontotext
Ontology A machine readable conceptual model a common vocabulary for sharing information machine-interpretable definitions of concepts in a domain and their relations formal representation of a set of concepts within a domain and the relationships between them
Conceptual Schemata and Individuals Schema(ta) the laws Classes and Properties Individuals the population Instances of classes (Alice is a Girl) Facts characteristics, relationships and interactions of individuals
Representation Languages and Usage RDF & OWL are being widely accepted Mostly visible through RSS, FOAF, and Linked Open Data (LOD Cloud on the right)
Query Languages Defining path patterns over the knowledge graph W3C recommendation is SPARQL Results are sets of nodes in the graph Example on the left defines the pattern: Classes Blog posts (and their titles) commented by people interested in the WWW conference in 2008 Properties Instance
Semantic Annotation (as of 2001) Alignment with respect to a conceptual model and an individual in an underlying instance base It is both Sort of meta-data and the process of creating this meta-data
Simple Usage: Highlight, Hyperlink, and
Simple Usage: Explore and Navigate
Identification / Identity Resolution
Semantic Annotation (as of 2008)
Cross Doc Coreference
Semantic Repositories provide storage, indexing, querying & automatic reasoning of structured data using ontologies as semantic schemata and dynamic data models, that can change on the fly ptop:agent mydata:ivan owl:symmetricproperty owl:inverseof owl:inverseof owl:relativeof ptop:parentof rdfs:subpropertyof owl:inverseof ptop:person rdfs:range owl:inverseof owl:inverseof ptop:childof ptop:woman mydata: Maria inferred rdf:type
OWLIM OWLIM is Ontotext s semantic database Handling billions of facts and configurable inference Allowing structured queries and FTS on literals Supporting extended RDF models (quadruples and quintuples) Swift OWLIM uses in-memory reasoning and query evaluation and is the fastest known RDF(S) and OWL engine Big OWLIM uses binary persistence and is the only engine proven to support non -trivial OWL inference against 12+ billion facts Lots of benchmarks on real and synthetic data sets (including LOD) are available http://www.ontotext.com/owlim/
KIM Platform for semantic annotation, indexing and retrieval automatic knowledge acquisition through built-in GATE-based text mining or 3 rd party meta-data generation for other content types (e.g. multimedia) Multi-paradigm navigation and retrieval on top of: Text (FTS) with document structure and metadata restrictions Conceptual models, instance base, and facts; taking benefit of inference Co-occurrence of instances in contexts Annotation patterns Hybrid or multi-paradigm queries combining all of these http://www.ontotext.com/kim
KIM Platform Disks WWW Intranet Existing models, taxonomies, dictionaries, thesauri, schemata O 1 O 2 O 3 3 rd party Ontology Editors Document & Meta Data Aggregator / Crawler A B C D Knowledge Base Engineering Convertors Visual Interface 3rd party App Semantic Indexing & Storing Semantic Index Multi-paradigm Search/Retrieval
Need Definition Natural Language Semantic Search Meaning Structured Queries Metadata; Patterns Some meaning FTS words
Retrieval Paradigms STRUCTURE: graph pattern search PATTERNS: predefined pattern searches; canned queries FACETS: Co-occurrence based search HYBRID: hybrid FTS, doc structure, metadata, entity lookup and co-occurrence search ANNOTATIONS: ANNIC-style Mimir-powered search over annotation patterns BOOLEAN: FTS with restrictions on doc structure and metadata ONTOLGY: conceptual model hierarchy observation
Retrieval: BOOLEAN Boolean operators Document structure Doc-level metadata
Structure Entity pattern queries over the ontology/kb graph structures
Patterns
Retrieval Paradigms: FACETS Facets of specific Class/Type Filtering as you type Co-occurrence based Selection reduces the document set and the facets contents
Results: Entity Sets Sets of entities matching the query Hyperlinked to their semantic descriptions Further navigation
Results: Documents Doc set matching the query With some metadata and snippets Hyperlink to Doc Detail
Results: Doc Detail, Metadata
Results: Doc Details: Content Navigation Hyperlinked Searched Terms & Entities
Some Applications ln.ontotext.com - small corpus of recent news with focus on people, orgs, GPEs kim.ontotext.com same type of entities but for 1.2M news 2002-2007 fda.semanticannotation.com/exopatent drug-development patents analyzed for measurements, drugs, compounds, diseases, etc. ARIS Asset Recovery Intelligence System Explanation of the corpora, setup, IE apps, search & everything will be verbal only Demos and Hands-on now More at: http://www.ontotext.com