Semantic Searching John Winder CMSC 676 Spring 2015
Semantic Searching searching and retrieving documents by their semantic, conceptual, and contextual meanings Motivations: to do disambiguation to improve retrieval accuracy precision and recall to unite the Semantic Web
Semantic Web Standardizations data formats and schemas XML, RDF query languages RDQL, SPARQL Key Ideas metadata ontologies The Semantic Web Stack
Ontology An ontology is a knowledge base models hierarchies, relationships (is-a, has-a) uses formal languages (inspired by databases) Examples: A query to retrieve a list of paintings and their painters. WordNet (dog is-a canine is-a carnivore, etc.) ConceptNet
Main Advancements Vector Space model Boolean model no partial matching no clear ranking method requires parallel metadata rank by TF-IDF Semi-structured Vector Space Semantic Search System by Vallet et al. [2005] fully structured ontological mapping has worse recall keyword searching is flexible but has worse precision
Main Advancements (cont.) Query Expansion searching by meaning, beyond literal keywords given a query, map into ontology, find new relations returns documents even without search keywords being present in the documents examples: presidents of the French government reports on flooding for cities in Asia with populations under 50,000
Main Advancements (cont.) Generating queries search by keyword parses out entity/relations Semantic Ranking by entity ReConRank in SWSE by relationship by document annotations (Swoogle) Ontology-Based Semantic Search System by Fernandez et al. [2011]
Mimir: Semantic Search at Scale (2015) Mimir, annotation-based semantic search uses GATE to do NLP, extract entities/relationships open source, distributed (federated) system complex query parsing, indexing at three levels: tokens, annotations, sub-annotations applied to real world corpora (over 150 million docs) immunology dataset patent dataset, searching for prior art
Future Applications Recommender Systems build user profiles, use history to inform results Sentiment Analysis disambiguation to spot outliers in word usage Reasoning (Artificial Intelligence) inference: discovering new facts using ontologies to build... more ontologies
References Castells, Pablo, Fernandez, Miriam, and Vallet, David. An adaptation of the vector-space model for ontology-based information retrieval. Knowledge and Data Engineering, IEEE Transactions on, 19(2):261 272, 2007. Cunningham, Hamish, Maynard, Diana, Bontcheva, Kalina, and Tablan, Valentin. Gate: an architecture for development of robust hlt applications. In Proceedings of the 40th annual meeting on association for computational linguistics, pp. 168 175. Association for Computational Linguistics, 2002. Fernandez, Miriam, Cantador, Ivan, Lopez, Vanesa, Vallet, David, Castells, Pablo, and Motta, Enrico. Semantically enhanced information retrieval: an ontology-based approach. Web Semantics: Science, Services and Agents on the World Wide Web, 9(4):434 452, 2011. Havasi, Catherine, Speer, Robert, and Alonso, Jason. Conceptnet 3: a flexible, multilingual semantic network for common sense knowledge. In Recent advances in natural language processing, pp. 27 29, 2007. Hogan, Aidan, Harth, Andreas, Umbrich, Jrgen, Kinsella, Sheila, Polleres, Axel, and Decker, Stefan. Searching and browsing linked data with swse: The semantic web search engine. Web Semantics: Science, Services and Agents on the World Wide Web, 9(4):365 401, 2011. ISSN 1570-8268. doi:http://dx. doi.org/10.1016/j.websem.2011.06.004. fjwsg special issue on Semantic Search. Miller, George A. Wordnet: a lexical database for english. Communications of the ACM, 38(11):39 41, 1995. Russell, Stuart and Norvig, Peter. Artificial intelligence: A modern approach. Prentice-Hall, Englewood Cliffs, 3, 2003. Styltsvig, Henrik Bulskov. Ontology-based information retrieval. PhD thesis, Roskilde University, Denmark, 2006. Tablan, Valentin, Bontcheva, Kalina, Roberts, Ian, and Cunningham, Hamish. Mmir: An open-source semantic search framework for interactive information seeking and discovery. Web Semantics: Science, Services and Agents on the World Wide Web, 30(0):52 68, 2015. ISSN 1570-8268. doi: http://dx.doi.org/10.1016/j.websem.2014.10.002. Semantic Search. Vallet, David, Fernandez, Miriam, and Castells, Pablo. An ontology-based information retrieval model. In The Semantic Web: Research and Applications, pp. 455 470. Springer, 2005.