Path Expression Processing. in Korean Natural Language Query Interface. for Object-Oriented Databases. Jinseok Chae and Sukho Lee

Similar documents

Design and Implementation of HTML5 based SVM for Integrating Runtime of Smart Devices and Web Environments

Development of AniDB and AniXML Schema for Semi-Autonomous Animation Contents Generation using Korean Fairy Tale Text

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #5: Entity/Relational Models---Part 1

V. Thulasinath M.Tech, CSE Department, JNTU College of Engineering Anantapur, Andhra Pradesh, India

An Optimization of Disjunctive Queries : Union-Pushdown *

Data integration supports seamless access to autonomous, heterogeneous information

A NATUWAL LANGUAGE INTERFACE USING A WORLD MODEL

CHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS

ADVANCED DATABASES ; Spring 2015 Prof. Sang-goo Lee (11:00pm: Mon & Wed: Room ) Advanced DB Copyright by S.-g.

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

High Level Database Models

Conceptual Database Design. COSC 304 Introduction to Database Systems. Entity-Relationship Modeling. Entity-Relationship Modeling

Java OOP: Java Documentation

MIS Database Systems Entity-Relationship Model.

COSC 304 Introduction to Database Systems. Entity-Relationship Modeling

A Transaction Processing Technique in Real-Time Object- Oriented Databases

Introduction to Scheme

E-R Model. Hi! Here in this lecture we are going to discuss about the E-R Model.

Introduction to OOP. Procedural Programming sequence of statements to solve a problem.

Super-Key Classes for Updating. Materialized Derived Classes in Object Bases

Web Information Retrieval using WordNet

SAMOS: an Active Object{Oriented Database System. Stella Gatziu, Klaus R. Dittrich. Database Technology Research Group

The Architecture of a System for the Indexing of Images by. Content

Illumination-Robust Face Recognition based on Gabor Feature Face Intrinsic Identity PCA Model

SQL-to-MapReduce Translation for Efficient OLAP Query Processing

Knowledge Representations. How else can we represent knowledge in addition to formal logic?

CS143: Relational Model

Chapter 11 Object and Object- Relational Databases

Conceptual Design with ER Model

Fuzzy Structured Query Language (FSQL) for Relational Database Systems 관계형데이터베이스시스템을위한퍼지질의어 (FSQL)

Ontology Based Application Server to Execute Semantic Rich Requests

Entity Relationship Data Model. Slides by: Shree Jaswal

(a) Turing test: a test designed to indicate whether or not a system can be said to be intelligent. If a computer

1st frame Figure 1: Ball Trajectory, shadow trajectory and a reference player 48th frame the points S and E is a straight line and the plane formed by

Ontological Modeling: Part 2

EECS 647: Introduction to Database Systems

UMIACS-TR December, CS-TR-3192 Revised April, William Pugh. Dept. of Computer Science. Univ. of Maryland, College Park, MD 20742

CHAPTER 2 LITERATURE REVIEW

Principles of Programming Languages COMP251: Syntax and Grammars

An UML-XML-RDB Model Mapping Solution for Facilitating Information Standardization and Sharing in Construction Industry

Database Management

CS 582 Database Management Systems II

A CORBA-based Multidatabase System - Panorama Project

This is the Pre-Published Version

Knowledge Representation

News-Oriented Keyword Indexing with Maximum Entropy Principle.

Optimized Query Plan Algorithm for the Nested Query

perspective, logic programs do have a notion of control ow, and the in terms of the central control ow the program embodies.

Semantic Image Retrieval Based on Ontology and SPARQL Query

PRG PROGRAMMING ESSENTIALS. Lecture 2 Program flow, Conditionals, Loops

Aggregation Query Model for OODBMS

XML Schema Language Specifications for Conditional Knowledge

The SQL data-definition language (DDL) allows defining :

Clustering Technique with Potter stemmer and Hypergraph Algorithms for Multi-featured Query Processing

Database System Concepts and Architecture

An Optimal Locking Scheme in Object-Oriented Database Systems

X-KIF New Knowledge Modeling Language

Adding Context to Concept Trees

detected inference channel is eliminated by redesigning the database schema [Lunt, 1989] or upgrading the paths that lead to the inference [Stickel, 1

Activity Report at SYSTRAN S.A.

ENTITY-RELATIONSHIP MODEL. CS 564- Spring 2018

ITFOOD: Indexing Technique for Fuzzy Object Oriented Database.

Database Fundamentals Chapter 1

For our sample application we have realized a wrapper WWWSEARCH which is able to retrieve HTML-pages from a web server and extract pieces of informati

Database Technology Introduction. Heiko Paulheim

Increasing Database Performance through Optimizing Structure Query Language Join Statement

CSE 403: Software Engineering, Spring courses.cs.washington.edu/courses/cse403/15sp/ UML Class Diagrams. Emina Torlak

The Next Step: Designing DB Schema. Chapter 6: Entity-Relationship Model. The E-R Model. Identifying Entities and their Attributes.

Migrating to Object Data Management

Fundamentals of Health Workflow Process Analysis and Redesign

Software Engineering Fall 2014

Applicability Estimation of Mobile Mapping. System for Road Management

Project #1 rev 2 Computer Science 2334 Fall 2013 This project is individual work. Each student must complete this assignment independently.

Implementation of Semantic Information Retrieval. System in Mobile Environment

A Mobile Application Development Tool based on Object Relational Mapping Solution

Concept as a Generalization of Class and Principles of the Concept-Oriented Programming

Introduction to SQL/PLSQL Accelerated Ed 2

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe Slide 2-1

Clustering using Fast Structural Hierarchy Extraction from User-Defined Tags for Semi-Structural Languages

S T R U C T U R A L M O D E L I N G ( M O D E L I N G A S Y S T E M ' S L O G I C A L S T R U C T U R E U S I N G C L A S S E S A N D C L A S S D I A

Design and Implementation of an RDF Triple Store

Auto-focusing Technique in a Projector-Camera System

Modeling Cooperative Behavior Using Cooperation Contracts. Peter Lang. Abstract

Proceedings of International Computer Symposium 1994, Dec , NCTU, Hsinchu, Taiwan, R.O.C. 1172

R.S. Pressman & Associates, Inc. For University Use Only

Table of Contents. PDF created with FinePrint pdffactory Pro trial version

Domain-specific Concept-based Information Retrieval System

ACS-3902 Fall Ron McFadyen 3D21 Slides are based on chapter 5 (7 th edition) (chapter 3 in 6 th edition)

Measurement-based Static Load Modeling Using the PMU data Installed on the University Load

Research on Construction of Road Network Database Based on Video Retrieval Technology

Overview of db design Requirement analysis Data to be stored Applications to be built Operations (most frequent) subject to performance requirement

CSC148, Lab #4. General rules. Overview. Tracing recursion. Greatest Common Denominator GCD

A Kinect Sensor based Windows Control Interface

Tutorial notes on. Object relational structural patterns

Entity-Relationship Models: Good Design and Constraints

Can We Trust SQL as a Data Analytics Tool?

Chapter 12 Object and Object Relational Databases

USER SPECIFICATIONS. 2 Physical Constraints. Optimized Algebraic Representation. Optimization

A stack eect (type signature) is a pair of input parameter types and output parameter types. We also consider the type clash as a stack eect. The set

Transcription:

Path Expression Processing in Korean Natural Language Query Interface for Object-Oriented Databases Jinseok Chae and Sukho Lee Department of Computer Engineering, Seoul National University, San 56-1, Shinrim-Dong, Kwanak-Ku, Seoul, 151-742, Korea E-mail: fwahr, shleeg@ce2.snu.ac.kr Abstract A natural language query interface for databases provides the user friendliness in retrieving the desired information by querying in a native natural language. Up to now, many natural language query interfaces for conventional databases have been developed. However, the eld of natural language query interfaces for object-oriented databases which have recently started to emerge as the next-generation databases has become a new research area. This paper describes a processing technique to manipulate natural language representations of path expressions. From the fact that the path expression is one of the key features in the object-oriented data model, a frame-based decomposition method is proposed for ecient processing. 1 Introduction The objective of natural language interfaces is to take inputs in human language and extract from them something which is meaningful to a computer. A natural language query interface to a database system provides end users with a way to formulate queries in a native natural language. This is particularly useful because computer-naive users frequently need to access database systems. INTELLECT[1] from AI corporation is a natural language information system which is commercially available. It makes the computer understand everyday English. It is designed to be a domainindependent system for relational databases. KDA[2] integrates natural language query system with skeleton-based query guiding facility. When a This paper was supported in part by NON DIRECTED RESEARCH FUND, Korea Research Foundation. user works with the KDA natural language query system, the query guiding facility supplies several kinds of skeletons to guide users in performing database retrieval tasks. It generates SQL database queries from English natural language queries. NHI[3] and K-NLQ[4] systems are developed as Korean natural language query interfaces for relational databases. The NHI and K-NLQ accept Korean natural language queries and transform them into QUEL and SQL, respectively. Kim et al.[5] proposes a Korean Natural Language Query System which also transforms Korean queries into SQL. Recently, object-oriented databases (OODB) started to emerge as the next-generation databases which can model the complicated real world. Therefore, the eld of natural language query interfaces for objectoriented databases has become a new research area. For object-oriented databases, KID[6] is proposed. This interface transforms Korean queries into query graphs used in object-oriented data model. There are important dierences between the objectoriented data model and the relational data model. The object-oriented data model includes the objectoriented concepts of encapsulation, inheritance, path expressions, and arbitrary data types; these concepts are not part of the conventional data model. Among these, the path expression is one of the key features of the object-oriented data model used to retrieve the desired data by navigating the class-attribute hierarchy. This paper describes a path expression processing technique used in the Korean Interface for Databases (KID) system. The KID employs a frame-based decomposition method to process the natural language representations of path expressions. In this paper, the KID is upgraded to generate OQL (Object Query Language) proposed in ODMG-93[7] instead of query

Korean Natural Language Queries KID Frame Name Parents Frame Natural Language Analyzer Predicate Argument Structures Semantic Interpreter Query Frames OQL Generator OQL Figure 1: System architecture graphs, the format of a query frame is modied and more basic patterns are identied. The remainder of this paper is organized as follows. The overview of KID and the schema of a sample database, basic patterns, and an extended query frame are explained in Section 2. In Section 3, the processing technique of natural language representations of path expressions is described. Section 4 shows experimental results consisting of a number of examples generating OQL from Korean natural language queries. Finally, the conclusion is given in Section 5. 2 Overview of KID 2.1 System Architecture The KID consists of three modules: natural language analyzer, semantic interpreter and OQL generator. The natural language analyzer accepts Korean queries and generates appropriate parsing trees. The semantic interpreter decomposes the parsing results into query phrases by referring to the database dictionaries and builds query frames for each query phrase. Then the OQL generator integrates these query frames and produces OQL. The block structure of the KID is shown in Figure 1. In the gure, rectangles indicate modules and arrows the ow of processing. Natural language analyzer: This module performs morphological analysis[8] and parsing to create the internal representations such as parsing trees from Korean queries. The parsing mechanism uses a variation of the CYK-algorithm[9]. The KID system employs a general natural language analyzer Figure 2: Format of a query frame used in Korean-English machine translation[10]. The natural language analyzer generates two structures: tree structure and predicate argument structure[11]. Among these, the semantic interpreter accepts the predicate argument structure. Semantic interpreter: This module decomposes the predicate argument structures into query phrases (QPs) and builds query frames (s). It utilizes two database dictionaries: schema dictionary and domain dictionary [6]. The schema dictionary is used to specify the schema related information and the domain dictionary is used to determine the domain of unknown terms having the semantic ambiguities. OQL generator: This module integrates all query frames and generates OQL. A query frame is designed to have the information about the class-attribute hierarchy such as classes, attributes, relationships, values, and operators. The format of a query frame is shown in Figure 2. Comparing to the format in [6], and slots are added. is used to specify the indicated class (or one of its subclasses) to go down the class-attribute hierarchy. is used to apply the aggregation function to the corresponding attribute. 2.2 Class-Attribute Hierarchy Figure 3 shows a sample class-attribute hierarchy used in this paper. It consists of six classes and `' indicates multi-valued attributes. In this classattribute hierarchy, classes have attributes of the reference attribute representing the attribute-domain relationship. 2.3 Basic Patterns A Korean queries can be decomposed into a number of QPs and each QP is one of the identied basic pat-

dept snum Integer height Integer residence enrolls* Department univ location String Address zipcode Integer country String city String teacher credit Integer University city president String String Professor major String The Korean words corresponding each head type are as follows. HT1 : _f*"(show),, CK!""(retrieve), )-!" "(output), "-!""(list), HT2 : uvsv' "(who), z. ' "(what),, "' "(how), )$' "(where),. :L' "(when), HT3 : - ( )' ", 2@' ",. ' "(what number of), HT4 : $ "(is there),, "(is not there), 2.4 Denition of Korean Queries Figure 3: Class-attribute hierarchy terns. By analyzing sample queries, we identify seven basic patterns: head phrase I (HP1), head phrase II (HP2), noun modier phrase (NMP), verb modier phrase (VMP), adverb modier phrase (AMP), verb phrase (VP), and comparative phrase (CP). Among these, NMP, VP and CP were identied in previously [6] and other phrases are identied additionally in this paper. The predicate argument structures of each basic pattern are as follows: HP1: HEAD! [ HT1 HT2 HT4 ] ` [SUB OBJ]! NOUN Noun HP2: HEAD! HT3 ` (MOD! ADV ^fx) ` MOD! NOUN - NMP: QP-HEAD! Qp-head ` MOD! NOUN Noun VMP: QP-HEAD! Qp-head ` MOD! VERB Verb AMP: QP-HEAD! Qp-head VP: CP: ` MOD! ADV Adverb QP-HEAD! Qp-head ` [MOD VCON]! VERB Verb ` [MOD OBJ SUB NCON]! NOUN Noun QP-HEAD! Qp-head ` [MOD VCON]! VERB Verb ` SUB! NOUN Noun1 ` MOD! NOUN Noun2 HEAD represents a head word of a sentence. It is classied into four types: HT1, HT2, HT3 and HT4. QP-HEAD indicates a head word of a QP. MOD denotes modiers, SUB subject, OBJ object, VCON verb conjunction, NCON noun conjunction and ADV adverb. The sign ` ' denotes `OR'. A Korean query (KQ) consists of a head phrase (HP) and a main query (MQ). The MQ is classied into two kinds: simple query (SQ) and composite query (CQ). The SQ is a query which is a simple concatenation of several QPs without any conjunction (e.g., `and', `or' or `among'), but the CQ has such conjunctions. If a query has the word ` ' or `" HL' which means `among' in English, then it will have an `AMONG' indicator. The denition of Korean queries is as follows: KQ :: HP M Q M Q :: SQ j CQ SQ :: QP 1 QP 2 QP n (n 1) CQ :: SQ 1 SQ 2 SQ m (m 2) HP :: HP 1 j HP 2 QP :: N M P j V M P j AM P j V P j CP :: AN D j OR j AM ON G 3 Path Expression Processing A path query has been well developed by database researchers during past decade. A path query is a query written against nested data, by specifying search conditions against nested data. A path query contains, instead of just an attribute, a sequence of attribute s called a path expression. For example, a type may have an attribute d `dept'; the domain of `dept' may be a Department type; and the Department type may have an attribute d `'. Then it should be possible to issue a single saying query that \nd all students whose departments d `Computer Eng.'." The WHERE clause of the query may contain a predicate.dept. `Computer Eng.'. Formally, a path expression is of the form

sel:attrex 1 : :AttrEx m where sel is the target class and AttrEx i (1 i m) are the reference attributes. The above path expression can be decomposed into sub-path expressions which indicate the reference relationship. The decomposed sub-path expressions are shown below. sel:attrex 1 (toclass 2 ) Class 2 :AttrEx 2 (toclass 3 ) Class m?1:attrex m?1(toclass m ) Therefore, the semantic interpreter can decompose the natural language representations of path expressions into QPs of sub-path expressions. For example, a path expression :dept:univ can be decomposed into QP 1 and QP 2. :dept QP 1 Department:univ QP 2 4 Experiments Q1 is an example of transformation process from Korean queries to OQL. Q1: $" 165_f" V[UW \ $ " UX{" "io$ \HL$"7L $ah"!~! CK, CK!"". (Retrieve students who are taller than 165 and enroll in \Database" which professor \G. D. Hong" teaches.) Predicate argument structure: HEAD! VERB retrieve ` OBJ! NOUN students ` MOD! VERB be taller ` SUB! NOUN height ` MOD! NOUN than 165 ` VCON! VERB and enroll ` MOD! VERB teach ` MOD! NOUN professor ` MOD! NOUN \G. D. Hong" 1 2 3 4 5 QP 1 6 7 8 9 10 QP QP3 QP 2 4 Figure 4: Decomposition process Decomposition process: The decomposition process employs the DFS (Depth First Search) algorithm. Figure 4 explains the process when QP 1 is CP, QP 2 and QP 3 are VP, and QP 4 is NMP. The nodes of the tree structure in Figure 4 denote words in the questions and the numbers above the nodes the visiting sequences by DFS. Decomposed QPs: QP 1 (CP): QP-HEAD! NOUN students ` MOD! VERB be taller ` SUB! NOUN height ` MOD! NOUN than 165 QP 2 (VP): QP-HEAD! NOUN students ` VCON! VERB and enroll QP 3 (VP): QP-HEAD! NOUN \Database" ` MOD! VERB teach ` MOD! NOUN professor QP 4 (NMP): QP-HEAD! NOUN professor ` MOD! NOUN \G. D. Hong" Query frames: Figure 5 shows the query frames for Q1. Three classes are involved in Q1 and these classes are linearly connected by attribute-domain link; i.e., Professor class is referred by class and class is referred by class. OQL: select x from x in, y in x.enrolls where x.height > 165 and y. \Database" and y.teacher. \G. D. Hong" Q2 shows another example in which three classes are involved in a dierent way from Q1; i.e., Department class is referred by class and class is also referred by class. Q2: \, ')! "9L af $UW \HL$"7L$ah"!~! CK, CK!"". (Retrieve students who belong to \Computer Eng." and enrolls in \Database".)

1 2 1 2 1 height 165 > enrolls Department dept Department "Computer Eng." 3 4 3 4 Professor teacher "Database" Professor "G. D. Hong" enrolls "Database" Figure 5: Query frames for Q1 Figure 6: Query frames for Q2 Predicate argument structure: HEAD! VERB retrieve ` OBJ! NOUN students ` MOD! VERB enrolls ` VCON! VERB and belong to ` MOD! NOUN \Computer Eng." Decomposed QPs: QP 1 (VP): Q P-HEAD! NOUN students ` M OD! VERB enrolls QP 2 (VP): QP-HEAD! NOUN students ` VCON! VERB and belong to ` MOD! NOUN \Computer Eng." Query frames: Figure 6 shows the query frames for Q2. OQL: select x from x in, y in x.enrolls where x.dept. \Computer Eng." and y. \Database" Q3 shows an example in which only one class is involved. Q3: \ $ " UX{. z. ' "? (What does professor \G. D. Hong" major in?) OQL: select x.major from x in Professor where x. \G. D. Hong" The execution results of experiments show that about 70% of sample questions are interpreted correctly if sample questions are generated by persons who have knowledge about databases and schema information. 5 Conclusion In this paper, we present a path expression processing technique to transform Korean natural language queries into OQL. From the fact that path expression processing is one of the important issues in object-oriented query processing, we propose a frame-based decomposition approach in order to manipulate the natural language representations of path expressions. Since a path expression can be decomposed into sub-path expressions, Korean queries can be decomposed into query phrases. The decomposed query phrases are transformed into query frames. Finally, all query frames are integrated and OQL is generated. It is necessary to collect a lot of sample queries for improving the performance of a natural language interface. In the future, we will concentrate on upgrading the processing capability of the system by collecting and experimenting a large number of sample queries.

A prototype system of KID is implemented on a SUN sparcstation using the C language. References [1] L. R. Harris, \Experience with INTELLECT: Articial intelligence technology transfer," The AI Magazine, Vol. 5, No. 2, 1984, pp. 43-50. [2] X. Wu, and T. Ichikawa, \KDA: A Knowledge- Based Database Assistant with a Query Guiding Facility," Trans. on Knowledge and Data Engineering, Vol. 4, No. 5, 1992, pp. 443-453. [3] S. Kim, \The Design and Implementation of Interface for Processing Natural Hangul Query," (in Korean) Journal of the Korean Information Science Society, Vol. 12, No. 1, 1985, pp. 31-44. [4] J. Chae, S. Kim, and S. Lee, \Design and Implementation of a Natural Language DB Query System," (in Korean) Journal of the Korean Information Science Society, Vol. 20, No. 6, 1993, pp. 810-820. [5] J. M. Kim, M, Y. Hyun, and S. J. Lee, \Koran Natural Language Query System for Searching Database,"(in Korean) Proc. of the 21st KISS Fall Conference, Oct., 1994, pp. 637-640. [6] J. Chae, and S. Lee, \Natural Language Query Processing in Korean Interface for Object- Oriented Databases," Proc. of the First International Workshop on Applications of Natural Language to Data Bases, June, 1995. [7] R. G. G. Cattell, (1993). The Object Database Standard: ODMG-93, Morgan Kaufmann Publishers. [8] S. S. Kang, and Y. T. Kim, \Syllable-based Model for the Korean Morphology," Proc. of the COLING 94, 1994, pp. 221-226. [9] J. Yang, and Y. T. Kim, \Korean Analysis using Multiple Knowledge Sources," (in Korean) Journal of the Korean Information Science Society, Vol. 21, No. 7, 1994, pp. 1324-1332. [10] H. G. Lee, and Y. T. Kim, \Korean-English Machine Translation based on Idiom Recognition," Proc. of IEEE Region 10 Conference (TENCON '93), 1993. [11] J. Allen, Natural Language Understanding, Benjamin/Cummings Co. Ltd., 1988.