THE TECHNIQUES FOR THE ONTOLOGY-BASED INFORMATION RETRIEVAL Myunggwon Hwang 1, Hyunjang Kong 1, Sunkyoung Baek 1, Kwangsu Hwang 1, Pankoo Kim 2 1 Dept. of Computer Science Chosun University, Gwangju, Korea Tel: +82-62-230-7799 E-mail: {mghwang, kisofire, zamilla100, hwangs00ks}@chosun.ac.kr 2 Dept. of CSE Chosun University, Gwangju, Korea Tel: +82-62-230-7636 E-mail: pkkim@chosun.ac.kr Abstract The use of ontologies to address the problems of the existing keyword-based search has been searched. For the efficient ontology-based information retrieval, there are several facts we should consider. In this paper, we describe the techniques demanded for the ontology-based information retrieval. Keywords, Query-Engine, two-column format, IEEE format. 1. Introduction Nowadays, people search the information on the web. The existing search engines help them searching information efficiently. But the information retrieval is still unsatisfied on the current web environment. The basic reason is that the computer cannot understand the web contents like human. To overcome the limitations, Tim Berners-Lee suggested the semantic web in the late 90 s. In the semantic web approaches, the core part is the ontology. That is, the degree of improvement of the semantic retrieval and success of the semantic web depends on the completeness and quality of the ontologies. In this paper, we suggest several techniques for the semantic information retrieval system. Especially, the techniques about the ontology occupy a great deal of weight in our system. In 2nd section, we introduce the related works. Then in section 3, we describe the techniques for the ontology-based information retrieval in details. In section 4, we evaluate our study. In the end of this paper, we conclude our study and suggest the future works. 2. Related Works For developing the OBIR system, several facts such as table 1 are considered. Table 1. The consideration facts for developing the OBIR system 1. Decision of the application s feature using ontology 2. Identification of the related facts with the user 3. Analysis the information and knowledge of the specific organization 4. Ownership problem 5. Examination of the system in the specific organization 6. Decision about the inference processing 7. Decision of the evaluation standard and measurement standard 8. Decision of the applicable range 9. Consideration about the data noise 10. Analysis of the ontology management tool and processing steps As we realized throughout table 1, for developing the OBIR system the developer should consider many facts. And we research to know how to apply the ontology in the information retrieval system through the analysis of the several OBIR systems that have developed until now. Firstly, in the OntoWeb project, the OBIR system was developed[1,2]. In this project, ontology is in charge of the guide role to search more related information about the user queries. And it tries to address the processing the meaning of the context. Secondly, the OntoBroker that is an ontology-based system was developed for analyzing the web documents and processing the user queries[3]. OntoBroker suggested the methodology for converting the HTML documents to ontology structure. And the people could search the information and understand the contents of the documents throughout the OntoBroker interface. In the OntoBroker, ontology is the common language for the information provider and searcher. And the ontology consists of the concepts, relationships and specific rules. Thirdly, MELISA(Medical Literature Search Agent) is the documents retrieval system about the medical part. It is one kind of the prototype system using the ontology. MELISA uses the medical ontology for addressing the user query problems and improving the retrieval accuracy of the documents about the medical part[6]. Figure 1 illustrates the structure of the information retrieval system in the semantic web[4]. The system consists of the search engine and ontology. In this structure, the upper part is 93560-1365 - Feb. 12-14, 2007 ICACT2007
in charge of the search engine and under part consists of the ontology related techniques on the ontology repository. Figure 1. The structure of the information retrieval system in the semantic web Above-mentioned most information retrieval systems tried to the semantic information search using the ontology. However, for developing the OBIR system many facts are considered about the ontology techniques such as the creation, management, inference and query processing about ontology. As those techniques are articulate each other, the ideal OBIR system will be developed. has several features. The significant features of the OntoMan describe as follows in details. 3.1.1. The Support for Automatic Building Methodology In the OntoMan, firstly, the system constructs the frame ontology about the specific domain automatically based on the WordNet. And then, the system adds more information to the frame ontology based on the specific input document that was made by domain experts. The methodology for the automatic ontology building is explained table 2. Table 2. The steps of automatic ontology building methodology 1. The user accesses the OntoMan. 2. The user selects the specific domain (based on the WordNet). 3. The system constructs the frame ontology (based on the WordNet). 4. The user inputs the specific document (made by domain expert). 5. The system adds more information to frame ontology. 6. The user modifies the ontology (add, delete, edit using OntoMan interface). 7. The system converts the constructed ontology to OWL format. 3.1.2. The Support for the GUI Environment The users are able to build the ontology easily using OntoMan. OntoMan is developed based on the GUI and especially, represent the ontology to tree structure. Figure 2 illustrates the screen shot of the OntoMan system. 3. The Techniques for -based Information Retrieval In our approach, ontology is in charge of the most important role and we focus the issues about how to manage the ontologies efficiently and how to apply the ontologies to the existing search engines. In this section, we describe the techniques for the ontology-based information retrieval. And then, we suggest the efficient ontology-based information retrieval model consisted of the core techniques related to the ontology. For the success of the ontology-based information retrieval, following these techniques related to the ontology are demanded and we design the following techniques. - Management Tool(OntoMan) - Repository - Web Crawler - Query Engine(CQEFT) 3.1 Management Tool(OntoMan) Firstly, we consider how manage the ontologies efficiently. In our study, we developed the ontology management tool that names OntoMan. The OntoMan supports the whole steps of the ontology building such as creation, deletion, edition, modification and storing. The users are able to build the ontology and then, store it in the ontology repository using the OntoMan. OntoMan is composed the GUI environment and Figure 2. The interface of the OntoMan system 3.1.3. The Support for the Writing Guide about the Language OntoMan provides the writing guide about the and OWL. and OWL are widely used for building the ontologies nowadays. However, the specification of and OWL is very complex and hard to understand. Thus, OntoMan provides the writing guide of and OWL. As mentioned before, OntoMan is a very important technique to manage the ontologies efficiently. Until now, most ontology-based information retrieval models ignore the steps about ontology creation to ontology management. These 93560-1366 - Feb. 12-14, 2007 ICACT2007
systems just used the pre-built ontologies or built the ontology newly. Thus, the interoperability among the systems is very low. So, the OntoMan was designed to support the total building steps about the ontologies. 3.2. Repository In here, we consider how provide the huge ontologies to the users efficiently. repository collects and stores the ontologies in the specific space. repository is connected to the OntoMan. And every user could access the ontology repository to use the existing ontogies. repository contains the ontology files(rdf and OWL) and the fact triple files reasoned by the inference engine. In our approach, if the ontology is created or collected newly, the system creates the fact triple files based on the pre-defined inference rules and stores the fact triple files with the original ontology files in the ontology repository together. 3.3. Web Crawler In this technique, we consider the reusability of the ontologies. The web ontology crawler finds the ontologies on the web and stores them in the ontology repository. The web ontology crawler consists of the domain classifying module, ranking module and retrieval module for reusing the existing ontologies efficiently. 3.3.1. Classifying The classifying module analyzes the ontologies and decides the domain concept about the ontology. For analyzing the concepts, firstly the domain classifying module matches the concepts in the ontology to the WordNet s concepts. The formula to define the domain concepts about the ontology is like below. It is the Resnik methodology. Using the formula we can define the minimum highest concept of the WordNet. Figure 3 explains how to decide the domain concept about the ontology using the formula. collected ontologies based on the domain concepts of the ontologies. 3.3.2. Ranking Although the ontologies are analyzed to the same domain concept, the degree of the ontology s integrity has a gap. When two more ontologies are analyzed to the same domain, we should give the ranking order for providing the efficient information. In this module, we measure the integrity of the ontology using the Jaccard formula and give the ranking order to each ontology 3.3.3. Retrieval The retrieval module support the efficient ontology search among a lot of ontologies stored in the ontology repository. Table 3 explains the whole processing steps of web ontology crawler. Table 3. Processing steps of the web ontology crawler Processing steps 1. Analysis the HTML Document RDF/OWL 2. Store the linked addresses in the que Parser 3. Transfer the RDF ontology to the Classifying Classifying Ranking Retrieval Web Retrieval User Interface Input the WordNet Matching (Synset_ID) Key 4. Match the concepts of RDF ontology to the concepts of WordNet 5. Decide the domain of the ontology using Resnik formula 6. Create the index ontology 7. Toss the results to Ranking 8. Evaluate the completion of ontology using Jaccard formula and give the ranking order to each ontology 9. Show the retrieval results in order based on the index ontology Web Page Queue : Store Retrieved Crawler HTML Parser Exclusion : Analyzed Web Page Matching Classifying Parser, Index Repository Consistency (%) Figure 4. Processing steps of the web ontology crawler Ranking s In a c d e s in Wordnet a b c d e 1 1 1 1 Jaccard Similarity 3.4. Query Engine(CQEFT) c33 c1 c4 c3 c2 c31 c32 c11 c14 c34 c36 c35 c37 c23 c12 c13 c21 c22 s are included in Figure 3. The decision of the domain concept based on the WordNet In figure 3, the minimum highest concept will become a domain concept about the ontology. After deciding the domain concept, this module creates the index ontology about all c38 c39 In here, we consider how evaluate the ontologies efficiently. In this study, we design the ontology query engine newly. It is the CQEFT(Controlled Query Engine For Triple). The CQEFT consists of the reasoning part and the query processing part. The reasoning part contains totally 55 inference rules the rules about the basic graph model (30), the inference rules supported by the web ontology language vocabularies(20) and consistency check rules(5). When new ontology is created or collected, the reasoning part makes the triple type files based on the 55 inference rules. And then, the query processing part extracts the information from the triple files made by the reasoning part. The query processing part has a feature that supports the text-based query interface for the 93560-1367 - Feb. 12-14, 2007 ICACT2007
normal users. So, the users can search the information easily although the users do not know the complex query syntax. Figure 5 and 6 show the reasoning part and query processing part of the CQEFT. Figure 5. The reasoning part of the CQEFT Figure 7. The structure of the ontology-based information retrieval model Our approach was designed to be able to retrieval the information semantically. The flow of our approach is like table 4. Table 4. The flow of our system 1. The user accesses the ontology-based information retrieval model. 2. The user inputs the query throughout the text based query processing part(cqeft). 3. The system finds the information about the query based on the pre-reasoned triple files. 4. The system gives the results to the user. 5. The OntoMan creates or manages the ontologies. 6. The web ontology crawler collects the ontologies from the web. Figure 6. The query processing part of the CQEFT 3.5. The Structure of the -based Information Retrieval Model In chapter 3.1, 3.2, 3.3 and 3.4, we explain four techniques that are demanded for achieving the ideal ontology-based information retrieval. In this paper, we compose the techniques and figure 7 illustrates the structure of the ontology-based information retrieval model. In our study, we realized that it is possible the semantic information retrieval based on the above processing steps throughout our approach. 4. Evaluations For evaluating our system, we make the scenario. The scenario is like as The man invites his girlfriend for dinner and his girlfriend is a vegetarian. So, he decides to prepare the TOFU Stake for dinner and wants to buy one bottle of wine well matched with the TOFU Stake. Thus, he finds the information that is The wine well matched with the TOFU Stake is the strong sweet white Zinfandel from the web search engine. And then, he tries to find a bottle of the wine that is the strong sweet white Zinfandel on the web site. In the evaluation, we compare our system and the other web search engines - Google and Yahoo that are the standard web search engine. As well as our scenario, we prepare three more queries for evaluating our system. And we measure the accuracy rate of three systems about four queries. The formula for the accuracy rate is like below. Accuracy rate = correct results / total searched results 93560-1368 - Feb. 12-14, 2007 ICACT2007
Our system Google.com Yahoo.com Luncheon(light dry wine) 45/52 41/100 35/100 Shellfish food (dry white wine) 62/87 53/100 47/100 TOFU Steak (sweet strong white Zinfandel) 11/19 8/100 11/100 Spicy food (sweet light white wine) 23/34 16/100 17/100 Accuracy ratesin the results of the Google and Yahoo, we got a lot of results about the queries. So, we made the deadline of the results that is one hundred items in order. And then, we find the correct results among one hundred results. Applications Institute (AIAI), the University of Edinburgh, 1997. [2] http://www.cs.umd.edu/projects/plus/shoe/. [3] http://ontoweb.aifb.uni-karlsruhe.de. [4] Aitken, S., Reid, S., "Evaluation of an -Based Information Retrieval Tool", 12th European conference on Artificial Intelligene(ECAI'00) Workshop on Applications of Ontologies and Problem-Solving Method, 2000. [5] Gruber, T., "Toward Principles for the design of ontologies used for knowledge sharing", International Journal of Human-Computer Studies, vol.43, no.5/6, pp. 907-928, 1995. [6] Abasolo, J.M., Gómez, M., "MELISA. An -based agent for information retrieval in medicine", ECDL 2000 Workshop on the Semantic Web(SemWeb2000), pp. 73-82, 2000. Figure 8. Accuracy rates using graph Table 5 and figure 8 illustrate the accuracy rate of the search results about each system. In our system, we could get the highest accuracy rate. At the results, we realized that it is possible the semantic information retrieval by using our system. 5. Conclusion In this paper, we suggest the semantic information retrieval system based on the ontology. In our study, we try to address the limitation of the existing ontology-based information retrieval system. For addressing the problems, we suggest and develop the all techniques related to the ontology theory. And then, we design the ontology-based information retrieval model by composing all techniques. Throughout the evaluation, we realized that it is able to retrieval the information semantically by using our approach. Acknowledgement "This research was supported by the MIC(Ministry of Information and Communication), Korea, under the ITRC(Information Technology Research Center) support program supervised by the IITA(Institute of Information Technology Advancement)" (IITA-2006-C1090-0603-0040) References [1] Uschold, M., King, M., Moralee, S., Zorgios, Y., "The Enterprise ", AIAI-TR-195, Aritificial Intelligence 93560-1369 - Feb. 12-14, 2007 ICACT2007