State of the art on Semantic Question Answering

Size: px

Start display at page:

Download "State of the art on Semantic Question Answering"

Tracey Hood
5 years ago
Views:

1 State of the art on Semantic Question Answering A Literature Review Technical Report kmi May 2007 Vanessa Lopez, Enrico Motta, Victoria Uren and Marta Sabou 1

2 Literature review and state of the art on Semantic Question Answering Vanessa López, Supervisors: Enrico Motta, Victoria Uren, and Marta Sabou Knowledge Media Institute, The Open University. Walton Hall, Milton Keynes,MK7 6AA, United Kingdom. {v.lopez, e.motta, v.s.uren, 1 Introduction and outline The goal of Question Answering (QA) systems as described by [38] is to allow users to ask questions in natural language, using their own terminology, and receive a concise answer, possible with enough validating context. As stated in [82] the different kind of searches mode provides complementary affordances: For example, keyword search and natural language querying offered easy routes in for users, but don t support exploration of the semantic search space, whereas view based search and forms can help the user explore space, but become tedious to use in large spaces and impossible in heterogeneous ones. The development of a semantic layer on top of web contents and services, the Semantic Web [10], has been recognized as the next step in the evolution of the World Wide Web as a distributed knowledge resource that brings to the Web the idea of having data formally defined and linked. The semantic web vision [10] is one in which rich, ontology-based semantic markup is widely available, enabling us to improve content by adding meta-information. This means that with structured or semi-structured documents, texts can be semantically marked-up and ontological support for term definition provided, both to enable sophisticated interoperability among agents, e.g., in the e-commerce area, and to support human web users in locating and making sense of information. Ontologies play a crucial role on the SW: they provide the conceptual infrastructure supporting semantic interoperability, addressing data heterogeneity [1] and opening up opportunities for automated information processing. However, because of the SW s distributed nature, data will inevitably be associated with different ontologies and therefore ontologies themselves will introduce heterogeneity. Different ontologies may describe similar domains, but using different terminologies, while others may have overlapping domains: i.e. given two ontologies, the same entity can be given different names or simply be defined in different ways. As argued in [66] the two processes of building the require infrastructure to produce a large semantic web and the smart applications to exploit such a markup (or at least, demonstrator) have to go hand in hand. Until recently, only a limited amount of semantic data was available, therefore, as stated in [66] the result of this strategy so far has been that, by and large, the early demonstrators produced in the past few years lack many of the key elements that will characterize real SW applications that will operate in an open, large-scale, distributed and heterogeneous environment. Consequently they are more akin to traditional knowledge-based systems. However, as the SW is gaining momentum, there is now a reasonable amount of online distributed semantic data 1. Language technologies and the Semantic Web can mutually benefit from each other [15]. The relevant literature for the research field of question answering (QA) over multiple ontologies, motivated by the emerging semantic web (SW) scenario, covers and combines many different areas. Therefore, first of all this chapter presents a general background to the research field of question answering systems, its aims, issues, history and typical architecture from natural language interfaces to databases, through an overview of the question answering track in the Text Retrieval Conference (TREC) and current approaches for open domain question answering, to semantic question answering using ontological information. However, when talking about semantic QA, the goal is to improve traditional methods by exploiting semantic data. These traditional methods involve not only typical QA techniques but also many other research areas, like semantic 1 (Lee & Goodwin, 2004) registered a 300% growth in 2004 alone and thus outpacing the growth of the web. The semantic search engine Swoogle (Ding et al, 2005) claims to index over 10,000 ontologies (March 07). Also, there is research work on the direction of integrating the widely adopted Folksonomies (tagging) with the formal semantics provided by the ontologies. 2

3 similarity measures, ontology selection, ranking and mapping algorithms, play an important role. Here we analyze all these inter-related areas from the point of view and dimensions of QA over semantic data, e.g. from traditional approaches that model semantic similarity by computing the semantic distance between definitions within a single ontology that is either a domain independent ontology or the result of the integration of existing ontologies, to determining semantic similarity among entities from different ontologies. 2 The goals and dimensions of Semantic Question Answering Systems. In a roadmap document for QA [38, 18] five standards or criteria for QA systems were identified: 1- Answers should be in real time, regardless of the complexity of the question, the size and multitude of the data sources or the ambiguity of the question. 2- Imprecise, incorrect answers are worse than no answers. The system is expected to deal with contradictions. There will be no answer if this one is not in the knowledge sources. 3- Usability, the knowledge of the system should satisfy the needs of the user 4- Answers distributed across multiple sources that require fusion techniques that combine partial answers from different sources should be coherent 5- The answer should be relevant within a specific context / task. Context can be used to clarify a question and resolve ambiguities. The system evaluation must be user-centered. The dimensions of QA systems are based on the different sources used to generate an answer: 1. Natural Language interface to databases use structured data 2. QA over semi-structured data (for instance health records, yellow pages) 3. Open Question Answering (the focus of most current research) over free text, as in the case of the TREC collection or the Web. 4. Over semantic data (ontologies). In addition, the source can be annotated images or even video. Another distinction between QA applications is whether they are domain-specific or domain-independent. Traditional, NLIDB are domain-specific while open question answering is domain-independent. Furthermore, current research in question answering focuses on factual question answering, where we can distinguish between wh-queries (who, what, how many, ) or commands (name all ) requiring an element or list of elements as an answer or an affirmation / negation type of questions. As pointed out in [40] more difficult kind of questions include those which ask for opinion, like Why and How questions, which require understanding of causality or instrumental relations, What questions which provide little constrain in the answer type, or definition questions. In [65] QA systems are classified into five increasingly sophisticated types of questions: systems based on factoids, systems with reasoning mechanisms, systems that deal with temporal reasoning, systems that fuse answers from different sources, interactive (dialog) systems and systems capable of analogical reasoning. The traditional intrinsic problems that QA systems try to solve are mapping the user terminology into the terminology used by the sources by using similarity measures (which require to handle unknown vocabulary) without affecting portability. Moreover, independently of the type of query, any non trivial NL QA system has to deal with ambiguity. Querying semantic information directly is still a novel research area, however the results from many popular research areas like ontology selection and ranking (how well they satisfy user queries), word sense disambiguation ( squash can be a sport or a vegetable), co-reference of instances and ontology mapping (e.g., mapping the class car in one ontology to the automobile class in another) can be applied. Concerning the querying environment there are also problems which are intrinsic to the large and dynamic semantic environment. The complexity arises because of its openness in which multiple heterogeneous and distributed ontologies from the same or different domain may describe similar or overlapping topics with different terminologies. A study presented in [82] about the potential size of the SW reveals that the SW is currently growing at the same rate than the Web in the early As stated on [82] semantic search systems should be able to support both large-scale ontologies (with many classes and instances) and large scale repositories (with many annotations). An example of very large ontologies with real world instance data are the SWETO ontology [4] which contains around 3

800000 entities and 1600000 relationships, and very large RDFs like the conversion of the English Wikipedia into RDF 2, which is a monthly updated dataset containing roughly 47 million triples.

4 entities and relationships, and very large RDFs like the conversion of the English Wikipedia into RDF 2, which is a monthly updated dataset containing roughly 47 million triples. An extended discussion of these issues of open semantic environment is presented in [67]. Furthermore, as stated on [82] iterative and exploratory search modes are important to the usability of all search systems, as well as the ability to provide justifications for an answer or suggest the presence of unrequested but related information or to propose alternate paths of exploration to the user. Usability is not covered in this review so for extended information see [82]. Also, as stated on [47] An often proposed solution to help casual users is the use of natural language interfaces. Such tools, however, suffer from one of the biggest problems of natural language interfaces: ambiguities. Furthermore, the systems are hardly adaptable for new domains. User s can do a lot more, as they can usually disambiguate between a range of possible factoid questions and answers and / or navigate information clusters in an answer space. Sources -Structured (databases) -Semi-structured (documents) -Open (free text) -Semantic (ontologies and KBs) Search environment - Large Scale - Heterogeneity -Openness -Cross-lingual Types of questions -Definitions -Factoids (wh-, affirm/negat) - Understanding and causality reasoning (why and how) -Temporal and spatial reasoning -Facts from different sources -World/domain knowledge and common sense reasoning -Interactive dialogs Intrinsic traditional problems - Ambiguity -Portability -Mapping -Ranking Challenges for the future: Figure 1. The dimensions of semantic questions answering NL QA approaches have largely been focused on retrieving the answer from raw text, therefore much work on ontology-driven QA tends to focus on the use of ontologies to support query expansion in information retrieval [59] rather than exploiting semantic metadata. However, we believe that the availability of distributed semantic markup on a large scale opens the way to novel QA systems. The novelty of these systems with respect to traditional QA systems is that they can make use of such semantic information to make sense of the user query to provide precise answers to questions posed in NL, without assuming that the user has any prior information about the semantic resources. In particular, a knowledge based QA system can help with answering questions where multiple pieces of information need to be inferred and combined at run time, rather than simply having a pre-written paragraph of text retrieved [21]. In this review we focus on all the relevant research areas related to the QA dimensions stated on Fig. 1 in an attempt to bridge the gap between current QA systems and the future development of QA systems which can search heterogeneous semantic metadata on the challenging open semantic web scenario. Or in other words, to move beyond domain specific semantic QA (like traditional NLIDB over structured data) to domain independent open QA over structured semantic information

5 Ontologies can not solve all the NLP tasks, as they are very basic in the way they express quantifications and have limited capability to reason about temporal and spational issues. Although simple questions formed with how-long or when, like when did the akt project start? can be handled, ontologies cannot cope with expressions like in the last year. This is because the structure of temporal data is domain dependent; compare geological time scales to the periods of time in a knowledge base about research projects. There is a serious research challenge in determining how to handle temporal data and causality in a way which would be portable across ontologies. However, they are a powerful tool to provide semantics, and in particular, they can be used to be able to move beyond single facts to answers built from multiple sources. We predict that intelligent behaviour when questioning the semantic web, will be a side effect of the ability of such systems to handle and find interesting connections, on a very large scale metadata from heterogeneous sources, in a meaningful way. Our interest lies on QA over semantic data. We define semantic metadata as RDF triples based on some semantic web compatible ontology. In the rest of the paper will be review the state-of-the-art systems and algorithms that covers the different dimensions on the semantic QA scenario with respect to the expected standards. For this review, we just concentrate on factual questions from different sources NLIDB: NATURAL LANGUAGE INTERFACES TO DATABASES This scenario is of course very similar to asking natural language queries to databases (NLIDB), which has long been an area of research in the artificial intelligence and database communities even if in the past decade it has somewhat gone out of fashion [6]. However, it is our view that the SW provides a new and potentially important context in which the results from this area can be applied. The use of natural language to access relational databases can be traced back to the late sixties and early seventies [6] 3. The first QA systems were developed in 1960s and they were basically NL interfaces to expert systems, tailored to specific domains, being the most famous ones BASEBALL, and LUNAR. Both systems were domain specific, the former answer questions about the US baseball league over the period of one year, the later answer question about the geological analysis of rocks returned by the Apollo missions. LUNAR was able to answer 90% of the questions in its domain when posed by untrained people. In [6] a detailed overview of the state of the art for these systems can be found. Most of the early NLIDBs systems were built having a particular database in mind, thus they could not be easily modified to be used with different databases or were difficult to port to different application domains (different grammars, hard-wired knowledge or mapping rules had to be developed). Configuration phases were tedious and required a long time. These earlier QA systems had a core hand-crafted database or knowledge system hand-written by the domain experts. Some of the early NLIDBs relied on pattern-matching techniques. In the example described by Androutsopoulos in [6], a rule says that if a user s request contains the word capital followed by a country name, the system should print the capital which corresponds to the country name, so the same rule will handle what is the capital of Italy?, print the capital of Italy, Could you please tell me the capital of Italy. The shallowness of the pattern-matching would often lead to bad failures but it has also been unexpectedly effective techniques to for exploiting the Web as a data source. Other approaches are based on statistical or semantic similarity. For shorter answers, NLP-based systems have generally been better than statistical based ones, although the latter are surprisingly competitive. For example, FAQ Finder [19] is a natural language QA system that uses files of FAQs as its knowledge base; it also uses WordNet to improve its ability to match questions to answers, using two metrics: statistical similarity and semantic similarity. However, the statistical approach is unlikely to be useful to semantic QA because it is generally accepted that only long documents with large quantities of data have enough words for statistical comparisons to be considered meaningful [19]. Semantic similarity scores rely on finding connections through WordNet between the user s question and the answer. The main problem here is the inability to cope with words that are not explicitly found in the KB. The next generation of NLIDBs used an intermediate representation language, which expressed the meaning of the user s question in terms of high level concepts, independent of the database structure [6]. TEAM [57] is an 3 Example of such systems are: LUNAR, RENDEZVOUS, LADDER, CHAT-80, TEAM, ASK, JANUS, INTELLECT, BBn s PARLANCE, IBM s LANGUAGEACCESS, Q&A Symantec, NATURAL LANGUAGE/ DATATALKER, LOQUI, ENGLISH WIZARD, MASQUE 5

6 experimental, transportable NLIDB (front end portable) developed in The TEAM QA system consists of two major components: (1) for mapping NL expressions into formal representations; (2) for transforming these representations into statements from a database, making a separation of the linguistic process and the mapping process into the KB. To improve portability, TEAM requires separation of domain-dependent knowledge (to be acquired for each new database) from the domain-independent parts of the system. In TEAM the logical form constructed constitutes an unambiguous representation of the English query. The latest approach in generation of NLIDB systems based on formal semantics (1991) presented in [26] made a clear separation between the NL front ends, which have a very high degree of portability, and the back end. The front end provides a mapping between sentences of English and expressions of a formal semantic theory, and the back end maps these into expressions which are meaningful with respect to the domain in question. Adapting a developed system to a new application will only involve altering the domain specific back end. MASQUE/SQL [5] is a portable NL front-end to SQL databases (1993). The semi-automatic configuration procedure uses a built-in domain editor which helps the user to describe the entity types to which the database refers, using an is-a hierarchy, and then to declare the words expected to appear in the NL questions and to define their meaning in terms of a logic predicate that is linked to a database table/view. However, this still entities an intensive configuration procedure. More recent work in the area (2003) can be found in [71]. PRECISE [71] maps questions to the corresponding SQL query by identifying classes of questions that are easy to understand in a well defined sense: the paper defines a formal notion of semantically tractable questions. Questions are sets of attribute/value pairs and a relation token corresponds to either an attribute token or a value token. Each attribute in the database is associated with a wh-value (what, where, etc.). In PRECISE, a lexicon is used to find synonyms. However, in PRECISE the problem of finding a mapping from the tokenization to the database requires that all tokens must be distinct; questions with unknown words are not semantically tractable and cannot be handled. As a consequence, PRECISE will not answer a question that contains words absent from its lexicon. Using the example suggested in [71], the question what are some of the neighborhoods of Chicago? cannot be handled by PRECISE because the word neighborhood is unknown. When tested on several hundred questions, 80% of them were semantically tractable questions which PRECISE answered correctly and the other 20% were not handled and a paraphrase was requested. Further discussion on advantages, disadvantages and challenges with respect to semantic QA: Since the development of the first rich in NL understanding QA system LUNAR (a syntax-based system where the parsed question is directly mapped to a database expression by the use of rules) there have been improvements in the availability of lexical knowledge bases, such as WordNet, String Distance Metrics for Name-Matching Tasks like the open-source API from Carnegie Mellon University [22], and shallow, modular and robust NLP systems, like GATE [24]. Compare it with the latest work on NLIDB, semantic QA can benefit form the ontology semantics and generic thesaurus in a way that: (1) Later NLDBI systems use intermediate representations therefore although the front end is portable (as Copestake [23] states the grammar is, to a large extent, general purpose, it can be used for other domains and for other applications ) the back end is dependent on the database, so normally longer configuration times are required. However, in the case of semantic QA systems, the processes, both the linguistic and the one that access to the knowledge bases, can potentially be not only ontology portable but ontology independent, where the configuration to change the domain would be almost zero. Optionally, on these semantic systems some manual configuration can be allowed to optimize performance (i.e. for very specific or precise domains). oreover, the user can be required to be involve in the disambiguation process (when the ontology semantics does not provide further way to disambiguate) and mechanisms to learn from the user interaction and its jargon (i.e. based on case base reasoning) can be implemented. (2) If a word is lexically very dissimilar to the word used by the user, and it does not appear in any manually or automatic created lexicon, instead of failing to provide an answer, in semantic QA the ontology can be used to study the ontology neighborhood of the other terms in the query, which may lead us to the value of the term or relation we are looking for. In many cases this would be all the information needed to interpret a query 6

7 Summarizing, as stated on [46] we can say that most existing NLis to databases that allow full NL input are in almost every case restricted to the domain queried by the database. As [46] states the applications of NLIs to SW ontologies is not as common as expected, which is surprising given the semantic information that is enclosed in ontologies and could be exploited in query analysis and translation OPEN DOMAIN QUESTION ANSWERING We have already pointed out that research in NLIDB is currently a bit dormant, therefore it is not surprising that most current work on QA, which has been rekindled largely by the Text Retrieval Conference (see examples below), is somewhat different in nature from querying ontologies and knowledge bases. The Text Retrieval Conference (TREC) is sponsored by the American National Institute (NIST) and the Defense Advanced Research Projects Agency (DARPA). TREC tracks on open domain question answering were introduced in 1999 (TREC-8). The ARDA's Advanced Question and Answering for Intelligence funded AQUAINT program is a multi-project effort to improve the performance of QA systems over free large heterogeneous collections of structured and unstructured text or media. TREC tests are a warm-up for research on more ambitious forms of QA supported by the AQUAINT program. Given the large, uncontrolled text files and the very weak world knowledge available from WordNet and gazetteers and so on, these systems have performed extremely well, e.g. the LCC system [63] that uses a deeper linguistic analysis and iterative strategy obtained an score of by answering correctly 415 questions of 500 in TREC-11 (2002). However, there are linguistic problems common in most kinds of NL understanding systems (for instance, all question understanding and processing systems are required to recognize equivalent questions, regarding idiomatic forms). Some of the shallow methods used keyword-based techniques to locate interesting sentences from the retrieved documents based on the presence of the answer type. Ranking is based on syntactic features such as word order or similarity to the query. Templates can be used to find answers that are just reformulations of the question. More sophisticated syntactic, semantic and contextual processing to construct an answer might include: named-entity recognition, relation extraction, coreference resolution, syntactic alternations, word sense disambiguation, logical inferences, temporal or spational reasoning and so on. Furthermore, most of the systems classify the query based on the type of the answer expected: a name (i.e. person, organization), a quantity (monetary value, distance, length, size,..) or a date. Question s classes are arranged hierarchically in taxonomies and different types of questions require different strategies. These systems often utilize world knowledge that can be found in high level ontologies such as WordNet, or the Suggested Upper Merged Ontology (SUMO) The current state-of-the-art open QA systems consist on modules which retrieve information from documents, parse sentences and filter small text fragments that contain strings of the same type as the expected answer, pinpoint question types, analyze semantics, reason, access external resources, extract and rank answers. Going more in detail, question answering applications for text typically involve two steps, as cited by Hirschman [40]: 1. Identifying the semantic type of the entity sought by the question ; 2. Determining additional constraints on the answer entity. Constraints can include, for example, keywords (that may be expanded using synonyms or morphological variants) to be used in matching candidate answers; and syntactic or semantic relations between a candidate answer entity and other entities in the question. Various systems have, therefore built hierarchies of question types based on the types of answers sought [64, 80, 39,87]. For instance, in LASSO [64] a question type hierarchy was constructed from the analysis of the TREC-8 training data, in which a score of 55.5% for short answers and 64.5% for long answers was achieved. Given a question, it can find automatically (a) the type of question (what, why, who, how, where), (b) the type of answer (person, location..), (c) the question focus, defined as the main information required by the interrogation (very useful for what questions which say nothing about the information asked for by the question). Furthermore, it identifies the keywords from the question. Occasionally, some words of the question do not occur in the answer (for example, if the focus is day of the week it is very unlikely to appear in the answer). Therefore, it implements similar heuristics to the ones used by named entity recognizer systems for locating the possible answers. Named entity recognition and information extraction (IE) are powerful tools in question answering. One study showed that over 80% of questions asked for a named entity as a response [80]. In that work, Srihari and Li argue that: (i) IE can provide solid support for QA; (ii) Low-level IE is often a necessary component in handling many types of questions; (iii) A robust natural language shallow parser provides a structural basis for handling 7

8 questions; (iv) High-level domain independent IE is expected to bring about a breakthrough in QA. Where point (iv) refers to the extraction of multiple relationships between entities and general event information like WHO did WHAT. As described in [80], NE is necessary but not complete in answering questions because NE by nature only extracts isolated individual entities from text, therefore methods like the nearest NE to the queried key words are used. An efficient system should group together equivalent question types independently of how the query is formulated. Most of the open-domain QA systems classify questions according to their answer target. For example, in Wu et al. s ILQUA system [86] these are categories like person, location, date, region or subcategories like lake, river. The best results of the TREC9 [25] competition were obtained by the FALCON system described in Harabaigiu et al. [37] where it generates a score of 58% for short answers and 76% for long answers. In FALCON the answer semantic categories are mapped into categories covered by a Named Entity Recognizer. When the question concept indicating the answer type is identified, it is mapped into an answer taxonomy. The top categories are connected to several word classes from WordNet. In an example presented in [37], FALCON identifies the expected answer type of the question what do penguins eat? as food because it is the most widely used concept in the glosses of the subhierarchy of the noun synset {eating, feeding}. All nouns (and lexical alterations), immediately related to the concept that determines the answer type, are considered among the keywords. Also, FALCON gives a cached answer if the similar question has already been asked before; a similarity measure is calculated to see if the given question is a reformulation of a previous one. Other NL search engines such as AskJeeves [7] and EasyAsk [28] exist, which provide NL question interfaces to the Web but retrieve documents, not answers. AskJeeves relies on human editors to match question templates with authoritative sites; systems such as START [44], REXTOR [45] and AnswerBus [89], whose goal is also to extract answers from text. Further discussion on advantages, disadvantages and challenges with respect to semantic QA: There are many similarities between open-domains systems and semantic question answering; both would need to attempt to find synonyms, plus their morphological variants, for the terms or keywords. Also in both cases, at times, the rules keep ambiguity unresolved and produce non-deterministic output for the asking point (for instance, who can be related to a person or to an organization ). The main two differences between are: (1) Open domain QA classify queries based on hierarchies of question types based on types of answer sought. In semantic QA there is not need for building hierarchies, or heuristics to recognize named entities, as the semantic information needed is in the ontology. (2) Semantic systems can greatly benefit from exploiting the ontological relationships in order to understand a query. (3) In semantic systems the classification on the query can be based on the kind of answer and semantic equivalent representation (relationships between the terms) of the question rather than only on the type of the answer expected (categories like person, location, date, region or subcategories like lake, river). Therefore, there is not need to build complex hierarchies of answers. In a semantic system, different queries, formed by what is, who, where, tell me, etc., may belong to the same category, e.g. who works in the semantic web area? is structurally similar to which research areas are covered by semantic question answering?, and both require as an answer a list of elements of the type of the ground term that occur in the relationship. (4) Some kinds of questions are harder than others when querying free text. The ontology simplifies the way to handle what-is queries, in which the type of the answer expected in unknown, because the possible answer types are constrained by the types of the possible relations in the ontology. 8

9 The use of triples on the literature The semantic web is represented by triples, so a system to target the Semantic Web should be able to handle and find the ontology triples that can answer a user query. Moreover, for semantic QA to be effective, most of the questions need to be translated into one or more ontology compliant triples. A triple is generally equivalent to a logical form (where the operator is the semantic relation, although is not strictly required). First of all, as pointed out by Katz et al. [44], although not all possible queries can be represented in the binary relational model, in practice these exceptions occur very infrequently. Secondly, RDF-based knowledge representation (KR) formalisms for the semantic web, such as RDF itself [73] or OWL [58] also subscribe to this binary relational model and express statements as <subject, predicate, object>. Hence, it makes sense for a query system targeted at the semantic web to adopt a triplebased model that shares the same format as many millions of other triples on the Semantic Web. START come online in 1993 as the first QA system available on the Web. START focuses on questions about geography and the MIT infolab. START adopts a triple-base data model called object-property-value. Using an example presented in [44]: what languages are spoken in Guernsey?, for START the property is languages between the Object Guernsey and the value French. However, the NL is not always semantically sound, the question Who invented the transistor? yields two answers: the inventor of the transistor, but also a description about it. The system described in Litkowski et al. [50], called DIMAP, extracts semantic relation triples after the document is parsed and the parse tree is examined. The DIMAP triples are stored in a database in order to be used to answer the question. The semantic relation triple described consists of a discourse entity (SUBJ, OBJ, TIME, NUM, ADJMOD), a semantic relation that characterizes the entity s role in the sentence and a governing word which is the word in the sentence that the discourse entity stood in relation to. The parsing process generated an average of 9.8 triples per sentence. The same analysis was for each question, generating on average 3.3 triples per sentence, with one triple for each question containing an unbound variable, corresponding to the type of question. DIMAP-QA converts the document into triples. The discourse entities are the driving force in DIMAP triples, key elements (key nouns, key verbs, and any adjective or modifier noun) are determined for each question type. The system categorized questions in six types: time, location, who, what, size and number questions. PiQASso [8] uses a coarse-grained question taxonomy consisting of person, organization, time, quantity and location as basic types plus 23 WordNet top-level noun categories. The answer type can combine categories. For example, in questions made with who, where the answer type is a person or an organization. Categories can often be determined directly from a wh-word: who, when, where. In other cases additional information is needed. For example with how questions the category is found from the adjective following how ( how many or how much ) for quantity, how long or how old for time, etc. The type of what <noun> question is normally the semantic type of the noun which is determined by WNSense, a tool for classifying word senses. Questions of the form what <verb> have the same answer type as the object of the verb. What is questions in PiQASso also rely on finding the semantic type of a query word. However, as the authors say it is often not possible to just look up the semantic type of the word, because lack of context does not allow identifying (sic) the right sense. Therefore, they accept entities of any type as answers to definition questions, provided they appear as the subject in an is-a sentence. If the answer type can be determined for a sentence it is submitted to the relation matching filter during this analysis, the parser tree is flattened into a set of triples and certain relations can be made explicit by adding links to the parser tree. For instance in [8] the example man first walked on the moon in 1969 is presented, in which 1969 depends on in, which in turn depends on moon. Attardi et al. propose short circuiting the in by adding a direct link between moon and 1969 following a rule that says that whenever two relevant nodes are linked through an irrelevant one, a link is added between them SEMANTIC QUESTION ANSWERING OVER ONTOLOGIES We have already mentioned that many systems simply use an ontology as a mechanism to support query expansion in information retrieval. In contrast with these systems, the systems reviewed here provide answers, derived from semantic annotations, to queries expressed in NL. We look into the understanding and query disambiguation process to map NL queries into ontological queries at syntactic and semantic level. AquaLog is one of the first s ontology-based NL QA systems to query the semantic mark-up. The novelty of the 9

10 system with respect to traditional QA systems is that it relies on the knowledge encoded in the underlying ontology and its explicit semantics to disambiguate the meaning of the questions and provide answers, and as pointed on the first AquaLog conference publication [51] it provides a new twist on the old issues of NLIDB. To bridge between the terminology used by the user and the concepts used by the underlying ontology. In a first step, by using linguistic techniques based on GATE [24], the system classifies and breaks up the question into binary linguistic triples. Then, these terms are linked and mapped to ontology elements, generating ontology compliant triples from where the answer is derived. At the syntactic level AquaLog identifies ontology mappings for all the terms and relations in the linguistic triple by considering the entity labels. It relies on simple, string-based comparison methods (e.g., edit distance metrics) and WordNet to look-up lexically related words (synonyms). At the semantic level, AquaLog s interactive ontology based relation similarity service uses the ontology taxonomy and relationships (domain knowledge) to semantically disambiguate between two or more possible terms or relations in order to interpret a query, and to link the ontology triples between themselves to obtain the equivalent representation of the user query. When the ambiguity cannot be resolved by domain knowledge the user is asked to choose between the alternative readings. AquaLog is agnostic to the domain of the ontology that it exploits and it needs very little customization being almost ontology independent. While AquaLog is ontology portable, the user still needs to tell the system which ontology is going to be used. Furthermore, AquaLog includes a learning component, which ensures that the performance of the system improves over time, in response to the particular community jargon used by the end users. Some major limitations identified in AquaLog so far are due to linguistic and semantic coverage, lack of appropriate reasoning services or lack of enough mapping mechanism to interpret a query, and the constraints imposed by the ontology structures. For instance, AquaLog does not yet fully exploit quantifier scoping ( each, all, some ) and negative words (such as not and never or implicit negatives such as only, except and other than ), and as AquaLog is used for stand-alone questions, it does not handle the linguistic phenomenon called anaphora, in which pronouns (e.g. she, they ), and possessive determiners (e.g. his, theirs ) are used to denote implicitly entities mentioned in an extended discourse. Also AquaLog is limited to queries that are not translated into more than two triples (for example in the question which researchers have publications in KMi related to social aspects, where the linguistic triples obtained are: <researchers, have publications, KMi> <which is/are, related, social aspects>, the relation have publications refers to publication: has author in the ontology, therefore, we need not two but three triples to represent three relationships between the terms: <research,?, publications> <publications,?, KMi> <publications, related, social aspects>). AquaLog similarity services do not detect compound nouns (i.e. French researchers is in fact a compound noun as there is an implicit relation between French and researchers ) and at the moment AquaLog cannot associate linguistic relations to non-atomic semantic relations. In other words, it cannot infer an answer if there is not a direct relation in the ontology between the two terms implied in the relationship, even if the answer is in the ontology. Also, the ontology design determines the questions which may be successfully answered. For example, when one of the terms in the query is not represented as an instance, relation or concept in the ontology but as a value of a relation (string). Orakel 4 [20] is a NL interface which translates wh-queries into F(rame)-logic and evaluates them with respect to a given knowledge base. The main feature is that it makes use of a compositional semantic construction approach thus being able to handle questions involving quantification, conjunction and negation in a classical way. In order to translate factual wh-queries 5 it uses an underlying syntactic theory built on a variant of Lexicalized Tree Adjoining Grammar (LTAG), extended to include ontological information. The scope of ORAKEL is limited to the domain lexicon and the F-Logic KB. The parser makes use of two different lexicons: the general lexicon and the domain lexicon. The general or domain independent lexicon includes closed-class words such as determiners i.e. a, the, every, etc., as well as question pronouns, i.e., who, which, etc. and thus is domain independent. The domain lexicon varies from application to application and its generated partially automatically for each application out of the KB (the ontological lexicon is derived fully automatically from the domain ontology loaded in the system, it relies on the labels of the concepts and instances to generate appropriate grammar trees) and by a domain expert. The only lexicon component which has to be generated by the user is the domain-specific lexicon in which verbs, adjectives and relational nouns are mapped to corresponding relations specified in the domain ontology. The semantic representation of the words in the domain independent lexicon makes reference to domain independent categories as given for example by a foundational ontology such as DOLCE. This obviously assumes that the domain ontology is somehow Factual or factoid in this context means that answers are ground facts as typically found in KBs 10

11 aligned to the foundational categories provided by the foundational ontology. In other words, the user is only involved in the creation of the domain specific lexicon, which is actually the most important lexicon as it is the one containing the mapping of linguistic expressions to domain-specific predicates. The user has to instantiate subcategorization frames and maps these to domain-specific relations in the ontology. In regards to semantics, it uses a subcategorization information automatically acquired from a big corpus, a statistical parser, and thewordnet synsets (in the most frequent sense) that best generalize the selectional preferences at each argument in the subcategorization frame. Another semantic search engine, which understands the sense of the user query to retrieve multimedia resources from a knowledge base, is the Librarian found in [49]. First, the NL query is translated into an unambiguous logical form and then the query is mapped to an ontology to generate a semantic query and solve ambiguities. It relies on simple, string-based comparison methods (e.g., edit distance metrics) and a domain dictionary to look-up lexically related words (synonyms). General purpose dictionaries like WordNet are not appropriate for very specific domains. On the other hand, creating a dictionary is very expensive in terms of portability but the strong advantage of this is a very high performance (from 229 user queries 97% were correctly answered in the evaluation). The e-librarian does not return the answer to the user s question, but it retrieves the most pertinent document(s) in which the user finds the answer to her/his question. As stated on [11] the SW provides a stable scaffolding for machine-based processing but the casual user is typically overwhelmed by the formal logic of the Semantic Web. To bridge the gap between the end user and the SW s capabilities they created GINO a guided input natural language editor that allows user to edit and query ontologies in a quasi-nl English. It uses a small static grammar which dynamically extends with elements from the loaded ontologies and allows an easy adaptation to new ontologies. GINO provides a guided and controlled entry to overcome the habitability problem of NL systems. GINO relies on a static sentence structure grammar used to parse sentences which is dynamically extended based on the structure and vocabulary of the loaded ontologies. When the user enters a sentence, an incremental parser relies on the grammar to constantly check the user entry to (1) propose possible continuations to the sentence and (2) prevent entries that would not be grammatically interpretable. GINO translates quasy English sentences into new triple sets or SPARQL statements. However, the major different between GINO and NLIs is that GINO does not use any predefined lexicon beyond the vocabulary that is defined in the static sentence structure grammar and provided by the loaded ontologies. Furthermore, the vocabulary is closed and limited, it does not try to semantically understand the entries or go beyond the vocabulary defined in the grammar and the ontologies. The approach in domain independent systems like AquaLog and QUERIX to allow NL queries and address ambiguities is asking the user for clarification whenever needed. In the paper by R. Basili [9], the possibility of building an ontology-based question answering system in the context of the semantic web is discussed. Their approach is being investigated in the context of EU project MOSES, with the explicit objective of developing an ontology-based methodology to search, create, maintain and adapt semantically structured Web contents according to the vision of semantic web. As part of this project, they plan to investigate whether and how an ontological approach could support QA across sites. They have introduced a classification of the questions that the system is expected to support and see the content of a question largely in terms of concepts and relations from the ontology. Basili et al. say that they will investigate how an ontological approach could support QA across a federation of sites within the same domain: since each node has its own version of the domain ontology, the task of passing a question from node to node may be reduced to a mapping task between (similar) conceptual representations. However, to support QA across ontologies they use the XeOML [70] mapping language to allow them to manually specify simple as well as complex mappings a priori between two ontologies describing the same domain. The knowledge-based approach described in [21] is to augment on-line text with a knowledge-based questionanswering component, [ ] allowing the system to infer answers to users questions which are outside the scope of the prewritten text. It assumes that the knowledge is stored in a knowledge base and structured as an ontology of the domain. Like many of the systems we have seen, it has a small collection of generic question types which it knows how to answer. Question types are associated with concepts in the KB. For a given concept, the question types which are applicable are those which are attached either to the concept itself or to any of its superclasses. A difference between this system and others we have seen is the importance it places on building a scenario, part assumed and part specified by the user via a dialogue in which the system prompts the user with forms and questions based on an answer 11

12 schema that relates back to the question type. The scenario provides a context in which the system can answer the query. The user input is thus considerably greater than we would wish to have in an open semantic scenario. Further discussion on advantages, disadvantages and challenges with respect to semantic QA: The aim is to provide a system which does not require users to learn specialized vocabulary, or to know the structure of the knowledge base, but as pointed out in [23], though they have to have some idea of the contents of the domain they may have some misconceptions. Therefore, to be realistic, some process of familiarization at least is normally required. Moreover, as indicated in [23], it is difficult to devise a sublanguage which is sufficiently expressive yet avoids ambiguity and seems reasonably natural. We see ontological semantic systems, like AquaLog, to be complementary to open domain QA systems. Open QA systems use the Web as the source of knowledge and provide answers in the form of selected paragraphs (where the answer actually lies) extracted from very large open-ended collections of unstructured text. The key limitation of an ontology-based system (as for closed-domain systems) is that it presumes the knowledge the system is using to answer the question is in a structured knowledge-base in a limited domain. However, ontological semantic systems exploit the power of ontologies as a model of knowledge and the availability of semantic markup offered by the SW to give precise, focused answers rather than retrieving possible documents or pre-written paragraphs of text. In particular, semantic markup facilitates queries where multiple pieces of information (that may come from different sources) need to be inferred and combined together. For instance, when we ask a query such as what are the homepages of researchers who have an interest in the Semantic Web?, we get the precise answer. Behind the scenes, AquaLog is not only able to correctly understand the question but is also competent to disambiguate multiple matches of the term researchers on the SW and give back the correct answers by consulting the ontology and the available metadata. As Basili argues in [9] open domain systems do not rely on specialized conceptual knowledge as they use a mixture of statistical techniques and shallow linguistic analysis. Ontological Question Answering Systems [ ] propose to attack the problem by means of an internal unambiguous knowledge representation. Probably, the main benefits of an ontology-based QA system on the SW, when compared to other type of QA systems, is that it can use the domain knowledge provided by the ontology to cope with words apparently not found in the KB and to cope with ambiguity (mapping vocabulary or modifier attachment) in a way that made the system completely portable. For instance, the AquaLog disambiguation techniques are sufficiently general to be applied in different domains. In AquaLog, disambiguation is regarded as part of the translation process, if the ambiguity is not solved by domain knowledge, then the ambiguity is detected and the user is consulted before going ahead (to choose between alternative reading on terms, relations or modifier attachment). As stated on [11] the major drawbacks of these systems are the domain-dependency of intelligent NLI that tend to require long amounts of domain specific knowledge to perform both complex semantic interpretations and high performance, and also the habitability problem 6. Both of them account for the fact that we are still far from the successful use of full NL interfaces to the SW. GINO tries to solve the habitability problem by using a guided and controlled NLI. However, we believe that the main limitation of the state-of-the-art semantic systems is that although portable (the system being agnostic to the domain of the underlying ontology) the scope of these systems are limited to the amount of knowledge encoded in one ontology (AquaLog, ORAKEL, GINO, e-librarian) or set of a priori defined ontologies in the same domain (Basili et al.). One way to overcome these limitations is to extend semantic systems in the direction of open question answering. So, the lack of very complex reasoning is substituted by the ability to deal and find connections with large amounts of heterogeneous data. However, in multi-ontology SW scenario, neither to involve the user in the difficult task to provide a domain-specific grammar for the system like in ORAKEL, nor the use of guided and controlled interfaces like GINO, which generates a dynamic grammar rule for every class, property and instance and present pop-up boxes to the user to offer all the possible completions to the user s query, are feasible solutions in a multi-ontology scenario. So, by using n-ontologies the habitability problem is partially reduced as the brittleness problem (it can only tell you what it already knows), which is typical of ontology driven systems, is also reduced. However, the usability of the system, so the user knows what subset on NL is possible to ask and the system 6 the habitability problem is the mismatch between the user s expectations and the capabilities of the NL system [32 ester]. 12

Is Question Answering fit for the Semantic Web?: a Survey.

Is Question Answering fit for the Semantic Web?: a Survey. Vanessa Lopez a,*, Victoria Uren b, Marta Sabou c and Enrico Motta b a Knowledge Media Institute. The Open University. Walton Hall, Milton Keynes,