Construction of Knowledge Base for Automatic Indexing and Classification Based. on Chinese Library Classification

Similar documents
INFORMATION RETRIEVAL SYSTEM: CONCEPT AND SCOPE

Semantic Visualization for Subject Authority Data of Chinese Classified Thesaurus

Information Push Service of University Library in Network and Information Age

Design and Realization of Agricultural Information Intelligent Processing and Application Platform

EBSCOhost User Guide Browsing. Subjects, CINAHL/MeSH Headings, Indexes, Thesauri, Publications, Cited References. support.ebsco.

Indexing and subject organisation

An Intelligent Retrieval Platform for Distributional Agriculture Science and Technology Data

Remotely Sensed Image Processing Service Automatic Composition

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2

Application of Individualized Service System for Scientific and Technical Literature In Colleges and Universities

Research and Design of Key Technology of Vertical Search Engine for Educational Resources

0.1 Knowledge Organization Systems for Semantic Web

Natural Language Processing with PoolParty

Ontology Molecule Theory-based Information Integrated Service for Agricultural Risk Management

Enhanced retrieval using semantic technologies:

Table of contents for The organization of information / Arlene G. Taylor and Daniel N. Joudrey.

A Knowledge Network Constructed by Integrating Classification, Thesaurus, and Metadata in Digital Library

APPLICATION OF JAVA TECHNOLOGY IN THE REGIONAL COMPARATIVE ADVANTAGE ANALYSIS SYSTEM OF MAIN GRAIN IN CHINA

Agricultural bibliographic data sharing & interoperability in China

The Comparative Study of Machine Learning Algorithms in Text Data Classification*

MLA International Bibliography

THEORY AND PRACTICE OF CLASSIFICATION

The Design and Implementation of Disaster Recovery in Dual-active Cloud Center

Knowledge organization on the Web ISKO-IWA meeting

On Transformation from The Thesaurus into Domain Ontology

Semantic Web Mining and its application in Human Resource Management

Design and Implementation of Search Engine Using Vector Space Model for Personalized Search

Equivalence Detection Using Parse-tree Normalization for Math Search

ImgSeek: Capturing User s Intent For Internet Image Search

SEARCH TECHNIQUES: BASIC AND ADVANCED

2 Ontology evolution algorithm based on web-pages and users behavior logs

From Scratch to the Web: Terminological Theses at the University of Innsbruck

Text Document Clustering Using DPM with Concept and Feature Analysis

The application of OLAP and Data mining technology in the analysis of. book lending

Data Mining Technology Based on Bayesian Network Structure Applied in Learning

TCM Health-keeping Proverb English Translation Management Platform based on SQL Server Database

Latest development in image feature representation and extraction

Revealing the Modern History of Japanese Philosophy Using Digitization, Natural Language Processing, and Visualization

Statistical Methods to Evaluate Important Degrees of Document Features

The Results of Falcon-AO in the OAEI 2006 Campaign

International Conference on Automation, Mechanical Control and Computational Engineering (AMCCE 2015)

What is Discover! Additional Resources CountryWatch MathSciNet Literature Resource Center ProQuest: Historical Newspapers PAIS

A Dublin Core Application Profile in the Agricultural Domain

Content Organization and Knowledge Management in the Digital Environment

Associating Terms with Text Categories

Domain-specific Concept-based Information Retrieval System

A Network-Based Management Information System for Animal Husbandry in Farms

[Type text] [Type text] [Type text]

Multi-dimensional database design and implementation of dam safety monitoring system

Automated Classification. Lars Marius Garshol Topic Maps

Analysis on the technology improvement of the library network information retrieval efficiency

E B S C O h o s t U s e r G u i d e P s y c I N F O

Metadata for Digital Collections: A How-to-Do-It Manual

Yunfeng Zhang 1, Huan Wang 2, Jie Zhu 1 1 Computer Science & Engineering Department, North China Institute of Aerospace

Chapter 27 Introduction to Information Retrieval and Web Search

INFS 427: AUTOMATED INFORMATION RETRIEVAL (1 st Semester, 2018/2019)

Creating a Corporate Taxonomy. Internet Librarian November 2001 Betsy Farr Cogliano

A REASONING COMPONENT S CONSTRUCTION FOR PLANNING REGIONAL AGRICULTURAL ADVANTAGEOUS INDUSTRY DEVELOPMENT

is easing the creation of new ontologies by promoting the reuse of existing ones and automating, as much as possible, the entire ontology

Searching PsycInfo & Proquest Psychology

Ontology based Model and Procedure Creation for Topic Analysis in Chinese Language

Discussion of GPON technology application in communication engineering Zhongbo Feng

DIGIT.B4 Big Data PoC

Headings: Academic Libraries. Database Management. Database Searching. Electronic Information Resource Searching Evaluation. Web Portals.

CE4031 and CZ4031 Database System Principles

Re-designing Online Terminology Resources for German Grammar

Text Mining. Representation of Text Documents

RETRACTED ARTICLE. Web-Based Data Mining in System Design and Implementation. Open Access. Jianhu Gong 1* and Jianzhi Gong 2

Developing ArXivSI to Help Scientists to Explore the Research Papers in ArXiv

PHARM 309 Secondary Resources Terry Ann Jankowski, MLS, AHIP 9 Oct 2006

Development of Contents Management System Based on Light-Weight Ontology

Falcon-AO: Aligning Ontologies with Falcon

Ontology Matching with CIDER: Evaluation Report for the OAEI 2008

NTUST Library Service & E-Resource

Access ERIC from the GOS-ICH Library website: hhttps://

Study on the feasibility of multilingual subject cataloging. at the Swiss National Library

The Semantics of Semantic Interoperability: A Two-Dimensional Approach for Investigating Issues of Semantic Interoperability in Digital Libraries

Enhancing E-Journal Access In A Digital Work Environment

Domain Specific Search Engine for Students

Content analysis and classification in mathematics

Analysis on international yak document and information resources

Schema Quality Improving Tasks in the Schema Integration Process

STRUCTURE-BASED QUERY EXPANSION FOR XML SEARCH ENGINE

CNBKSY Platform Manual for IP Login User

The Research on the Method of Process-Based Knowledge Catalog and Storage and Its Application in Steel Product R&D

User guide. ( Basic Search Tips

Organizing Information. Organizing information is at the heart of information science and is important in many other

Taxonomies and controlled vocabularies best practices for metadata

Engineering education knowledge management based on Topic Maps

WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS

User Guide. Basic Search Tips

The Promotion Channel Investigation of BIM Technology Application

Large Scale Chinese News Categorization. Peng Wang. Joint work with H. Zhang, B. Xu, H.W. Hao

Data Mining with Oracle 10g using Clustering and Classification Algorithms Nhamo Mdzingwa September 25, 2005

A Study of Future Internet Applications based on Semantic Web Technology Configuration Model

Research on Anti-collision Algorithm Optimization of RFID Tag Based on Binary Search

irnational Standard 5963

Using AgreementMaker to Align Ontologies for OAEI 2010

October 28, 2017 WELCOME SHAREPOINT SATURDAY OTTAWA. Going Meta How to use metadata in SharePoint

DISCOVERING INFORMATIVE KNOWLEDGE FROM HETEROGENEOUS DATA SOURCES TO DEVELOP EFFECTIVE DATA MINING

Transcription:

Construction of Knowledge Base for Automatic Indexing and Classification Based on Chinese Library Classification Han-qing Hou, Chun-xiang Xue School of Information Science & Technology, Nanjing Agricultural University, China hqhou@njau.edu.cn Abstract Class number, descriptor and keyword are three kinds of subject concept identifiers, among which there exist some concept ual mapping relationships, i.e. compatibility. According to this principle, we construct a CLC Knowledge Base on the basis of Chinese Library Classification for automatic indexing and classification. We compare it with the CLC system to illuminate its obvious advantages over automatic information processing and concept searching. We then introduce some key technologies in the process of construction at length and describe in brief their application to automatic indexing, automatic classification and concept searching. Keywords: automatic indexing, automatic classification, knowledge base, knowledge organization system, Chinese Library Classificatiom. 1. Introduction Knowledge organization systems (KOS) refer to all kinds of semantic tools that are used to describe and interpret human knowledge and its relationship s, such as library classifications, lists of subject headings, thesauri, semantic networks, maps of subject domains and ontologies. Library classification, lists of subject headings and thesauri have played an important part in organizing the traditional information resources, while the semantic network, subject maps and ontologies are designed for the second semantic Web. Some KOS in current use were constructed to improve traditional classifications or thesauri, inheriting and making use of an established knowledge system and abundant vocabulary. These systems have some features and functions of semantic network and ontology, which can promote the enhancing of knowledge processing and the efficiency of information retrieval. The knowledge base discussed in this article, called the CLC Knowledge Base, is also a KOS, an expertise system for knowledge organization, based on the Chinese Library Classification (thereinafter as CLC ). It reveals the concept mapping relationships among class numbers, descriptors and keywords in manually indexing records by statistical methods, and therefore it can be used to realize automatic indexing and classification, and concept searching. 2. Principles of Construction of the CLC Knowledge Base Classification scheme, thesaurus and natural language are three different kinds of information language with different symbols and organizational approaches. But they are the same, in essence; class numbers, descriptors and keywords all can be used to express the subject concept. There are some hidden mapping relationships conceptually, i.e. compatibility relationships among them. There are numerous manually indexed records of documents in most libraries, which simultaneously contain class numbers, descriptor strings or keyword strings. Through processing these data, we can mine the concept mapping relationships among class numbers, descriptor strings and keyword strings in order to construct a knowledge base. The CLC is a library classification based on the scientific classification and conceptual relation, so we can look upon it as a semantic network, which can be used to organize all kinds of information. The reasons why we choose the CLC as the frame of the knowledge base are: 1

Both classification scheme and thesaurus, even all KOS, use methodology of classification. The former uses open classification systems, while the latter hidden ones, such as cross-reference system, categorical index and hierarchical index. Classification scheme is the main part of the integrated vocabulary system, that is the classification/thesaurus system, and much easier to be accepted and understood. The CLC is a large universal classification edited by our own experts. It has been broadly used to classify and search the book materials, audio-visual materials and other sorts of information. The CLC exerts the most comprehensive influence domestically and boasts of numerous users, it has therefore been regarded as the national standard though not officially authorized. Since it was first published in 1975, the CLC has been continuously revised to meet information processing and accessing needs. It is currently in its 4 th edition; its electronic edition is stored in MARC format. The new edition has some features and functions, such as better logical knowledge organizing structure, more extensive coverage of knowledge, and faceted coordination. The CLC is widely used in most of the collections of Chinese documents. If we want to make use of these indexing records to construct knowledge bases, by choosing the CLC as the frame work, we can avoid switching class numbers of other classification schemes into the CLC class numbers. Most experts have approved the feasibility of applying the CLC to organize the Internet resources. Meanwhile, digitalization, faceted coordination, combination with natural language and hyperlinks has been added to the CLC; therefore it can be applied not only in traditional library but also in the web environments. Our automatic indexing and classification system is designed to organize both the traditional documents and digital information. The applicability both in traditional library and in web environments of the CLC happens to meet our needs. Given the advantages of this system, we use the CLC as frame when constructing the knowledge base to realize concept indexing and searching. 3. Comparison between the structure of the CLC Knowledge Base and the CLC system The CLC includes schedules, tables and indexes as well as other classifications. With the new trend that classifications integrate with thesauri developing, the CLC maps its class numbers to descriptors of the Chinese Thesaurus, like DDC to LCSH, and then develops an integrated vocabulary named Classified Chinese Thesaurus (thereinafter as CCT ); its first edition was edited from 1987 to 1993. At that time the CLC, Chinese Thesaurus and CCT made up a KOS, named the CLC system, specified in figure 1. Alt hough the CLC system did very well in the traditional library, its disadvantages are revealed when it is applied to the automatic processing of digital information in the Web. The disadvantages are as follows. The CLC system, both Classification scheme and Thesaurus, is a controlled language and lacks the elasticity of a natural language. The CLC system has a long period of revision, about eight to nine yeas, so many new words and subjects are not incorporated in a timely manner. The present classifications and thesauri have a small scale due to their printed edition. The CLC system cannot be directly applied to automatic information processing. We choose the CLC schedule to organize the knowledge base and improve it. We can discover compatible relationships among the class numbers, descriptor strings and keyword strings in the knowledge base, through statistics and computer technology. Compared with CLC system, the knowledge base adds some new features and functions, i.e. interface to natural language, continuously increasing scale, timely update, to adapt to the development of information organization in the Web. 2

The knowledge base is comprised of three parts: knowledge base for classification, knowledge base for subject indexing and supplementary knowledge base. The concordance of class numbers and keyword strings is the main part of the knowledge base for classification. Go -list, stop-list, dictionary of synonyms and semantic dictionary compose the knowledge base for subject indexing. Tables of area, periods, and document types compose the supplementary knowledge base that are used to extract the subjects about area, period and types from the documents. The structure and compositions of the knowledge base are specified in figure 2. The above two figures respectively reveal the frame of the CLC system and the structure of the knowledge base. Both are based on the CLC schedules and map their class numbers to descriptor or keywords, so they can be used to realize integrated classification with subject indexing. However, compared with the CLC system, the knowledge base is more suitable for automatic indexing and intelligent searching in their content, scale, structure and function. The reasons are as follows. The CLC system just reveals the mapping relationships between the CLC class numbers and descriptors of the Chinese Thesaurus, while the knowledge base reveals the mapping relationships among the class numbers, descriptor strings and keyword strings. The CLC system only comprises the class numbers and descriptors which were included in the CLC schedule and the Chinese Thesaurus, whereas the data of the knowledge base are from the manually indexing records, which includes a great deal of built class numbers and keywords or new words. So the scale of the knowledge base is larger than that of the CLC system. In the CLC system, one class number at most maps to 20 descriptors or strings, averagely 2-3. But, in the knowledge base, one class number averagely maps to 10-14 keyword strings, even more than several hundreds of strings. So the knowledge base could reveal the hidden concepts in the classes. 3

The terms in the CLC system are updated very slowly because both the CLC and the CCT have long revision periods and are maintained by hand. However, the knowledge base is compiled and maintained by machine, and can embody newly proposed terms in real-time. More vocabulary, especially new words can lead to high indexing consistency and correctness. Due to the limited scale and vocabulary of the CLC system, it is only applied to index and classify literature to hand. However, the knowledge base can ensure higher quality and correctness because of its larger scale, more sufficient vocabulary and flexibility. Moreover, the knowledge base is applied not only to indexing and classify ing automatically but also to searching information more intelligently. The knowledge base could give descriptors and keywords as indexing terms at the same time, separately by their facets such as areas, periods and document types and use its dictionary of synonyms to add the entry words. All these advantages of the knowledge base provide users with multiple aspect and intelligent searching. In general, KOS and the collections of library are separated. In our system, we use the technology of database and hyperlink to connect the knowledge base with the collections of literature, like the directory of search engine in the Internet. 4

4. Key technologies of constructio n of CLC Knowledge Base There are some key technologies in the construction of CLC Knowledge Base. We would like to introduce them in the following text. 4.1. Collecting source data from manually indexed records and library classification At first, we should collect source data to build up the source database. There are four kinds of data source. (1) The Indexes of the CLC and the class number-descriptor strings parallel list of the Classified Chinese Thesaurus; (2) Indexing records of the large libraries, e.g. Beijing Library and Shanghai Library, which include the CLC class numbers and descriptors of the Chinese Thesaurus; (3) Indexing records of the periodical literature of bibliographic databases, which include the CLC class numbers and keyword strings, i.e. Database for Chinese Periodicals of Science & Technology (namely VIP), and Database for Social Newspaper and Periodicals that edited by Shanghai Library; (4)Database of titles, which is composed by CLC class numbers and titles coming from some famous bibliographic database. Next, we filter the erroneous and duplicate records to form a source database, which contains the mapping relationships between class numbers and descriptor strings or keyword strings. 4.2. Constructing the knowledge base by statistics method After finishing the data collection, we extract terms and class numbers from the source database, computer the frequency of terms and measure the co-occurrence frequency of the class numbers and strings to construct the knowledge base. Of all dictionaries of the knowledge base, the construction of the class number-keyword strings parallel list is the most important work. Here we use the statistics method to mine the conceptual mapping relationships between the class numbers and keyword strings. Through three statistics respectively called frequency of class number, frequency of the keyword string and the cooccurrence of the class number and keyword string, we use two parameters, namely the support degree and the confidence degree, often used in data mining, to discover the mapping relationships between the class numbers and keyword strings. Then we could generate the knowledge base for automatic classification. The so-called support degree is the co-occurrence frequency of the class numbers and keyword strings in the source database. More co-occurrence frequency shows more indexers agreeing on the conceptual mapping relationships of both. Suppoort ( keyword = P( clc, keyword ) clc ) = freq _ gx P(clc, keyword): the probability that the class number and keyword string are co-existing in an indexing record of the source database; it could be measured by the cooccurrence frequency. Generally speaking, the conceptual mapping relationship of both could be considered correct if the amount of the support >= 2. The greater the degree of support, the more correct the mapping relationship. The degree of confidence reflects the probability of the keyword strings, on the premise that the class number has appeared. Conf ( clc keyword ) = P( clc, keyword ) / P( keyword ) = Freq _ gx / freq _ keyword P(clc, keyword): the probability that class number and keyword string are co-existing in an indexing record of the source database; it could be measured by the co-occurrence frequency. P(keyword): the probability of the keyword string appearance, i.e., the frequency of the string appearing in the whole source data. If the degree of support and of confidence of the class number and keyword string separately reach the threshold, the conceptual mapping relationship between the class number and keyword string would be acceptable. 4.3. Measuring the similarity to solve the multiple-tomultiple relationships between class numbers and strings The relationship between the class numbers and keyword strings is multiple-to-multiple in the source database. In our system, one string only maps to a class number, so one string must map to an exclusive class number in the knowledge base. There are many methods to measure the similarity between the class numbers and strings, such as MI, LogL, Dice, etc. Here we use Dice measure to find out the best class number for a string. P ( clc, keyword ) Dice = 1 2 [ P ( clc ) + P ( keyword freq _ gx = 2 ( freq _ clc + freq _ keyword ) Hereinto: Dice: the probability of the class number and keyword string co-existing; P(clc): the probability of the class number existing in the source database, viz. the frequency of the class number; P(keyword): the probability of the keyword string existing in the source database, viz., the frequency of the keyword string; P(clc, keyword): the probability of the class number and keyword string co-existing, viz., the co-occurrence frequency of the class number and keyword string. )] 5

If one string maps to multiple class numbers, the best class number is the one that is maximum value of Dice. 4.4. Using Cilin, a thesaurus of Chinese words, to create a semantic dictionary for recognizing the synonyms Turning the keywords into descriptors in the subject indexing, measuring the semantic similarity between the indexing subjects and the terms in the knowledge base in the automatic classification, concept searching, all these processes could not be achieved without recognizing the synonyms. So it is important to create a semantic dictionary to recognize the synonyms. Cilin is a semantically classified dictionary, organized like a semantic tree. It divides the Chinese words into three sorts according to the semantic relationships, and from here into 14 major classes, 94 secondary classes and 1428 small classes. The vocabulary of Cilin is made up mostly of pure words, which are the morphemes of compounds. Through using Cilin to create the semantic dictionary, we could, on the one hand, directly recognize the synonyms in the form of morphemes, on the other hand, mine the synonymous relationships among compounds. [Semantic code]=>(major class) (secondly class) (small class) (group) Thereinto, major class=>(capital letter), secondly class=>(capital letter) (lowercase), small class=>(capital letter) (lowercase) (number) (number), group=>(capital letter) (lowercase) (number) (number) (number). For example, the semantic code of the word hotel is [Dm040901], the corresponding code of its major class, secondary class, small class and group are (D), (Dm), (Dm0409), (Dm040901). (D) represents the major class Thing, (Dm) the secondary class Organization, (Di0409) the word troop Hostel under the small class Shop, (Dm040901) the group Hotel. Then we could code all the morphemes to create a semantic dictionary by this method. Through the semantic dictionary, we can analyze the semantics of the terms to measure the semantic distance of two terms, then turn keywords into descriptors, measure the semantic similarity between two strings to realize the automatic classification and concept searching. The above introduces some key technologies about how to construct the class numbers-keyword strings parallel list and semantic dictionary, which is the main strength of the knowledge base. Since these technologies are the particular aspects of the construction of knowledge base, we introduce them at length. Other technologies for the construction of the knowledge base are not given unnecessary detail here. 5. Application of CLC Knowledge Base The knowledge base has a framework of the CLC, based on manual indexing. It has constructed mapping relationships among class numbers, descriptor strings and keyword strings, based on the compatibility principle of classification schemes, thesauri and natural language, which included abundant vocabulary, synonyms and mapping relationships between keyword strings and class numbers. The knowledge base can be broadly applied into automatic indexing and classification, even concept searching. 5.1. To realize automatic indexing by word segment aided by go-list and stop-list and subject controlling aided by synonymy dictionary Select title, abstract, keywords given by the authors, references and so on as the indexing sources, segment the text of indexing sources using max matching algorithm aided by go-list and stop-list, calculate word frequency, word number, word position weight to give ranked indexing terms, then turn them into descriptors through the use of a dictionary of synonyms. 5.2. To realize automatic classification aided by class numbers-keyword strings parallel list, synonymy dictionary and tables of areas, periods and document types The automatic classification discussed in this article is a classification method that classifies the documents by keyword strings and concepts. First, it classifies the documents by string rather than single word, which can improve the correction and precision. Second, it classifies the documents by conceptual matching. When matching the indexing terms with terms in the knowledge base, it first calculates word-form similarity, if no result, calculates semantic similarity aided by a dictionary of synonyms, and a semantic dictionary to work out the best CLC class number under the consideration of correction and speed. Third, it is a method based on cases (that is, indexing experience). Every record in the knowledge base is an example ; the indexing terms or strings will match with them to work out the best classification results. Fourth, the facets of area, period and document type in the text are separately indexed by the subdivisions, thus some shortcomings of the CLC system applied in the automatic classification would be avoided. 5.3. To realize concept searching and multiple-approach searching based on synonymy dictionary and the results of automatic indexing and classification From the perspective of indexing, the results of subject indexing include two parts, i.e. keyword strings and descriptor strings, which help users search not only by keywords and descriptors, but also by strings retrieval rather than single word; furthermore it can add retrieval 6

entries aided by a dictionary of synonyms dictionary and realize concept searching by semantic dictionary to improve searching efficiency. From the perspective of classification, results of classification include main class number, subdivision number of area, period and document type, this way user can search information from subjects, areas, periods and documents types. 6. Conclusion [5] Zhang, Q.Y. (2002). A Concept and faceted coordinate system for automatic classifying. Library Journal, 6:9-10 [6] Hou, H.Q. (1998). Construction of the indexing languages compatibility system on the basis of the Classified Chinese Thesaurus. Journal of the National Library of China, 4:35-39,90 [7] Mei, J.J. (1983). Cilin Thesaurus of Chinese Words. Shanghai: Shanghai Lexicon Press. The knowledge base as a KOS based on the frame of the CLC utilizes dual indexing records simultaneously including class numbers and descriptor strings or keyword strings in a bibliographic database, which has the feature of literary and user warrant. Professional people revise the data of the knowledge base after the statistic computation, which allows the base to improve its accuracy. At the same time, the knowledge base is constructed by the statistics of large corpus computer-assisted compilation; thus the subjectivity of mapping of class numbers to strings can be avoided. Although the knowledge base is based on the CLC, it has more broad functions than the CLC system itself. We think that the current KOS is the combination of indexing languages with the modern technology of computer and network. The CLC Knowledge Base we have constructed is such an example, it possesses an abundant vocabulary and semantic relationships. It combines the traditional indexing languages, such as CLC and CCT, with modern technology, such as database, data mining, hyperlink and computational linguistics. In a sense, it has some features of Ontology suitable for automation of information processing today. But CLC Knowledge Base has some disadvantages on understood and intelligent reasoning by machine. Although the knowledge base has brought about an important practical utilization in the intelligent processing of information, it still needs further research and improvement. References [1] Zeng, M.L. (2004). Networked knowledge organization systems/services. New Technology of Library and Information Services, 1:2-3 [2] Hou, H.Q, Xue, P.J. (2003). Design & construction of knowledge database for automatic classification in Chinese. Journal of the China Society for Scientific and Technical Information, 22(6):681-686 [3] Zhang, C.Z. (2002). Web concept mining based on text layer model, automatic indexing and automatic classifying based on concept semantic network. Supervised by Han-qing Hou. Master Dissertation of Nanjing Agricultural University, 2002,6 [4] Xue, P.J. (2001). Research on intelligent search engine of Chinese economic information based on knowledge database. Supervised by Han-qing Hou. Master Dissertation of Nanjing Agricultural University, 2001,6 7