Information Systems & University of Koblenz Landau, Germany Semantic Search examples: Swoogle and Watson Steffen Staad credit: Tim Finin (swoogle), Mathieu d Aquin (watson) and their groups 2009-07-17
Types of semantic search engines Semantically-enhanced Search Yahoo! SearchMonkey, Google squared NLP-based Search MetaWeb Freebase, Powerset, Semantic-NLP-based Search hakia, Cognition, Computational-NLP-based Search True Knowledge, Wolfram Alpha, Search Swoogle, Sindice, SWSE, Falcon-S, Watson, Shoe, & 2of 52
Semantic Data Vs. Application needs Data Applications Discover how entities are related? Browse ontologies Gateway to Semantic Data Dynamically retrieving exploiting combining relevant semantic resources Find and rank ontologies Query multiple knowledgebases Answer NLP question & 3of 52
Semantic Search? Finding Ontologies? For reuse To build upon what exists To adopt what is used in practice Not to re-invent the wheel Because it is simpler than building from scratch For applications Because semantic applications need knowledge Because knowledge is hard to acquire Because some scenarios require to gather this knowledge at run-time Because in some scenarios, the more there is, the better & 4of 52
Swoogle first Gateway Data Applications Discover how entities are related Browse ontologies Answer NLP question Find and rank ontologies Query multiple knowledgebases & 5of 52
Swoogle Created in 2004 Crawls and discovers documents in RDF,OWL Indexing and retrieval system Search for Documents (SWD) Ontologies Instance data & 6of 52
Ontology Dictionary SWOOGLE 2 Some statistics service Swoogle Search IR analyzer Ontology Dictionary SWD analyzer Swoogle Statistics Web Server Web Service Human users Intelligent Agents SWDs Triples 2,978,838 705,078,866 analysis SWD Cache SWD Metadata Ontologies Over 10,000 digest SWD Reader discovery Candidate URLs Web Crawler The Web The Web SWD Rank A SWD s rank is a function of its type (SWO/SWI) and the rank and types of the documents to which it s related. Swoogle uses four kinds of crawlers to discover semantic web documents and several analysis agents to compute metadata and relations among documents and ontologies. Metadata is stored in a relational DBMS. Services are provided to people and agents. http://swoogle.umbc.edu/ SWD IR Engine Swoogle puts documents into a character n- gram based IR engine to compute document similarity and do retrieval from queries & 7of 52
How Swoogle can be used? Find ontologies Containing keywords, terms, concepts Similar to http://myontology.org/ Used to describe document X (directly or indirectly) Find SW documents Containing keywords, terms Used or created by specific institution Browse Ontologies using specific topic hierarchy Ontology metadata Entities and navigate between them & 8of 52
Swoogle architecture Uses Google APIs to discover URIs Crawls using these URIs as seeds Allows users to submit URIs Offers multiple interfaces Generates Metadata Used for ranking Basic RDF statistics and ontology annotation. RDF Statistics (determine SWD or SWO) Ontology Annotation & 9of 52
& 10 of 52
Limitations of Swoogle No quality control mechanisms Many ontologies are duplicated No quality information provided Limited Query/Search mechanisms No support for relations between ontologies Duplication, incompatibility (contradiction), modularization, versioning, etc. & 11 of 52
Watson Data Applications Discover how entities are related Watson Browse ontologies gateway Answer NLP question Find and rank ontologies Query multiple knowledgebases & 12 of 52
Watson design principles Information on ontology metrics Provides information on structure-based ontology metrics (e.g. richness of concepts / individuals, ontology population) Provides more semantic information for semantic applications (e.g. ontology topic relevance) to help discover, select, exploit and combine semantic resources Provides a variety of query and access mechanisms For both humans (web interface) and machines (web serv., API) To fit applications having different purposes and requirements Ranging from keyword search to ontology exploration and formal queries (SPARQL) Support for relations between ontologies Detecting redundancy, duplication, incompatibility (contradiction), modularization, versioning, etc. & 13 of 52
Watson architecture functional overview & 14 of 52
Watson: GUIs and APIs GUIs Several web-based interfaces for people APIs Important for application developers Lightweight web services Offers same or extended functionality as created GUIs & 15 of 52
Web Interface & Advanced Keyword Search 16 of 52
Web Interface Ontology Exploration & 17 of 52
Web Interface Ontology Metadata & 18 of 52
Web Interface Querying & 19 of 52
Applications of Watson Search for paths in ontologies Discover how entities are related Knowledge selection and modularization Find and use only the knowledge you need Natural language question answering Use data to analyze question and get answer Folksonomy enrichment Discover relationships between tags using ontology Semantic query expansion Generalize or specialize your queries & 20 of 52
Relation discovery & 21 of 52
Applications of Watson Search for paths in ontologies Discover how entities are related Knowledge selection and modularization Find and use only the knowledge you need Natural language question answering Use data to analyze question and get answer Folksonomy enrichment Discover relationships between tags using ontology Semantic query expansion Generalize or specialize your queries & 22 of 52
Knowledge Selection and Modularization Web t1 t2 t3 t4 t5 Ontology Selection t3 t2 t5 t1 t4 Ideal ontology includes - All search terms - Some additional knowledge tn The ideal world (Web) & 23 of 52
Knowledge Selection and Modularization t1 Web t3 t5 t4 t2 t3 t4 t5 tn Ontology Selection t2 tn t3 t1 t1 Ontology search results: Multiple ontologies Containing parts of interesting terms Much more additional knowledge (not so interesting for us) The real world (Web) & 24 of 52
Knowledge Selection and Modularization t1 Web t3 t5 t4 Ontology Modularization t3 t5 t4 t2 t3 t4 t5 Ontology Selection t2 tn t1 Ontology Modularization tn t2 t1 tn t3 t1 Ontology Modularization t1 t3 Knowledge Selection & 25 of 52
Knowledge Selection and Modularization t1 Web t3 t4 t5 Ontology Modularization t3 t5 t4 t2 t3 t4 t5 Ontology Selection t2 tn t1 Ontology Modularization tn t2 t1 Ontology Merging t2 t5 t3 tn t1 t4 tn t3 t1 Ontology Modularization t1 t3 Knowledge Selection & 26 of 52
Modularization and integration Query terms Found large ontology containing results and more Interesting ontology modules Ontology fragment interesting for us & 27 of 52
Applications of Watson Search for paths in ontologies Discover how entities are related Knowledge selection and modularization Find and use only the knowledge you need Natural language question answering Use data to analyze question and get answer Folksonomy enrichment Discover relationships between tags using ontology Semantic query expansion Generalize or specialize your queries & 28 of 52
PowerAqua Bridge the gap between the user and the : - Provide the user the capability to query the SW using Natural Language. Dynamically select and combine info drawn from the vast amount of heterogeneous semantic data to answer a user s query. & 29 of 52
PowerAqua 1. NL Question 2. Linguistic interpretation 3. Ontology based interpretation 4. Answer & 30 of 52
PowerAqua algorithm & 31 of 52
Applications of Watson Search for paths in ontologies Discover how entities are related Knowledge selection and modularization Find and use only the knowledge you need Natural language question answering Use data to analyze question and get answer Folksonomy enrichment Discover relationships between tags using ontology Semantic query expansion Generalize or specialize your queries & 32 of 52
FLOR: FoLksonomy Ontology enrichment Can the provide the structure needed to improve search and navigation of tagged spaces? & 33 of 52
FLOR methodology & 34 of 52
Search in Tag Spaces Let s find photos of animals which live in the water Query: Animal Water Dog Bird Land scape Cat Bird Dog Land scape Tiger Tiger Dog Bird Dog Bird Tiger Bird Bird Land scape Bird Tiger 5/24 21% precision & 35 of 52
Use ontologies for answering queries Animal Animal Water Mammal Fish livesin Body of Water <Animal livesin Water> FreshwaterFish SaltwaterFish livesin Sea Marine Mammal Dolphin Seal Sea Elephant Whale livesin Ocean <Dolphin> or <Seal> or < Sea Elephant > or <Whale> & 36 of 52
Results dolphin seal whale sea elephant 18/24 75% precision & 37 of 52
Applications of Watson Search for paths in ontologies Discover how entities are related Knowledge selection and modularization Find and use only the knowledge you need Natural language question answering Use data to analyze question and get answer Folksonomy enrichment Discover relationships between tags using ontology Semantic query expansion Generalize or specialize your queries & 38 of 52
Watson Web Documents & Expand query for web document with relevant ontology terms 39 of 52
Semantic query extensions An extension of a web search engine that suggests ways to extend a query thanks to online ontologies Example with the query researcher Suggests academic staff, Person, etc. as terms to generalize the query, and professor, PhD student as terms to specialize the query Without having to give the system any knowledge: everything comes from the Web! Versions: Gowgle (http://watson.kmi.open.ac.uk/gowgle): use the Google SOAP API and the Watson SOAP API Wahoo (http://watson.kmi.open.ac.uk/wahoo): use the Yahoo! REST API and the Watson REST API & 40 of 52
Query Result from Yahoo! Term suggestions Add/Replace & Screenshot of wahoo (REST based) 41 of 52
How it works? Find ontologies containing the keyword researcher http://watson.kmi.open.ac.uk/api/semanticcontent/keywords?q=researcher exactly researcher in the label or id of a class http://watson.kmi.open.ac.uk/api/semanticcontent/keywords?q=researcher& scope=ln+label&ent=class&match=exact Find entities corresponding to researcher in ontology http://watson.kmi.open.ac.uk/api/entity/keyword?q=researcher&uri=http:// calo.sri.com/core-plus-office&scope=ln+label&ent=class&match=exact Find subclasses and superclasses of an entity http://watson.kmi.open.ac.uk/api/entity/subclasses?ent=http://calo.sri.com/ core-plus-office#researcher&uri=http://calo. sri.com/core-plus-office The rest is interface stuff and call to Yahoo! & 42 of 52
Faceted browsing and ranking Faceted browsing / search great idea! Limitations? Huge number of entities / facets, but limited display capabilities What to show? What is important? & 43 of 52
Ranking with TripleRank Rank resources in the ontology w.r.t Link structure Authority modeling (HITS algorithm) Type of relationships between resources Additional dimension Based on tensor analysis and decomposition like: extension of matrix analysis and SVD Thomas Franz, Antje Schultz, Sergej Sizov, and Steffen Staab. TripleRank: Ranking SemanticWeb Data By Tensor Decomposition & 44 of 52
HITS by example & Traditional ranking with HITS algorithm 45 of 52
HITS by example Hyperlink-Induced Topic Search finds: Authorities (a): pages with good content about a topic, linked to by many hubs Hubs (h): pages that link to many good authority pages on a topic (directories) HITS is an iterative process to calculate hubs and authorities Equations Traditional ranking with HITS algorithm & 46 of 52
HITS by example Traditional ranking with HITS algorithm & 47 of 52
TripleRank Adjacency matrix for single property properties The same graph represented as tensor in TripleRank & 48 of 52
TripleRank tensor analysis Results of the tensor analysis: & 49 of 52
TripleRank results General linkage ranking Property-specific ranking & 50 of 52
TripleRank - ranked properties Evaluation interface Possible use for faceted browsing using properties & 51 of 52
Conclusions Search is one of the most useful services need such services too! Find relevant ontologies Reuse knowledge Semantic search is much more focused than traditional (syntactic) Different types of results and appropriate presentation Richer metadata Query understanding (NLP) and reasoning Number of semantic search engines is growing Need both for google for semantics and for very specialized services (e.g. medicine) & 52 of 52