A SPATIAL AND TEMPORAL DATA INTEGRATION METHOD FOR HETEROGENEOUS DATABASE ENVIRONMENTS

Similar documents
A Global Environment Analysis and Visualization System with Semantic Computing for Multi- Dimensional World Map

A Metadatabase System for Semantic Image Search by a Mathematical Model of Meaning

A Study on Metadata Extraction, Retrieval and 3D Visualization Technologies for Multimedia Data and Its Application to e-learning

Automatic Metadata Generation By Clustering Extracted Representative Keywords From Heterogeneous Sources

Framework for Supporting Metadata Services

XML Schema Matching Using Structural Information

Design Considerations on Implementing an Indoor Moving Objects Management System

A GML SCHEMA MAPPING APPROACH TO OVERCOME SEMANTIC HETEROGENEITY IN GIS

An Overview of various methodologies used in Data set Preparation for Data mining Analysis

Development of Contents Management System Based on Light-Weight Ontology

An Analysis of Image Retrieval Behavior for Metadata Type and Google Image Database

An Efficient Approach for Color Pattern Matching Using Image Mining

DERIVING SPATIOTEMPORAL RELATIONS FROM SIMPLE DATA STRUCTURE

THE CONTOUR TREE - A POWERFUL CONCEPTUAL STRUCTURE FOR REPRESENTING THE RELATIONSHIPS AMONG CONTOUR LINES ON A TOPOGRAPHIC MAP

DERIVING TOPOLOGICAL RELATIONSHIPS BETWEEN SIMPLE REGIONS WITH HOLES

TerraML A Cell-Based Modeling Language for. an Open-Source GIS Library

Information Gathering Support Interface by the Overview Presentation of Web Search Results

Author(s) Yoshida, Yoshihiro, Yabuki, Nobuy.

A Study of Future Internet Applications based on Semantic Web Technology Configuration Model

Reverberation design based on acoustic parameters for reflective audio-spot system with parametric and dynamic loudspeaker

An Associative Search Method Based on Symbolic Filtering and Semantic Ordering for Database Systems

A Real Time GIS Approximation Approach for Multiphase Spatial Query Processing Using Hierarchical-Partitioned-Indexing Technique

An Output Schema for Multimedia Data in Multimedia Database Systems

Keywords APSE: Advanced Preferred Search Engine, Google Android Platform, Search Engine, Click-through data, Location and Content Concepts.

Discovering Mappings between Ontologies in Semantic Integration Process

An Energy-Efficient Technique for Processing Sensor Data in Wireless Sensor Networks

XML and Inter-Operability in Distributed GIS

Information Management (IM)

Opera of Meaning: film and music performance with semantic associative search

A Miniature-Based Image Retrieval System

0.1 Upper ontologies and ontology matching

Ontology Matching with CIDER: Evaluation Report for the OAEI 2008

Frequent Inner-Class Approach: A Semi-supervised Learning Technique for One-shot Learning

A Novel Method for the Comparison of Graphical Data Models

1 Definition of Ontologies

The Semantics of Semantic Interoperability: A Two-Dimensional Approach for Investigating Issues of Semantic Interoperability in Digital Libraries

Fault Identification from Web Log Files by Pattern Discovery

Principles of Dataspaces

TOWARDS ONTOLOGY DEVELOPMENT BASED ON RELATIONAL DATABASE

Data integration supports seamless access to autonomous, heterogeneous information

Transactions on Information and Communications Technologies vol 18, 1998 WIT Press, ISSN

Benchmarking the UB-tree

A Sketch Interpreter System with Shading and Cross Section Lines

Suggested Topics for Written Project Report. Traditional Databases:

MULTIMEDIA DATABASES OVERVIEW

Keywords Repository, Retrieval, Component, Reusability, Query.

Generalized Document Data Model for Integrating Autonomous Applications

Realistic Program Visualization in CafePie

Ontology Based Prediction of Difficult Keyword Queries

Session 2 A virtual Observatory for TerraSAR-X data

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES

Modeling of Output Constraints in Multimedia Database Systems

OASIS: Architecture, Model and Management of Policy

MERGING BUSINESS VOCABULARIES AND RULES

Context Ontology Construction For Cricket Video

M. Andrea Rodríguez-Tastets. I Semester 2008

Data Processing Technology on Large Clusters of Automation System

Integrated Usage of Heterogeneous Databases for Novice Users

Event Object Boundaries in RDF Streams A Position Paper

Class #2. Data Models: maps as models of reality, geographical and attribute measurement & vector and raster (and other) data structures

1. Inroduction to Data Mininig

Motion analysis for broadcast tennis video considering mutual interaction of players

General Image Database Model

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

Songklanakarin Journal of Science and Technology SJST R1 Ghareeb SPATIAL OBJECT MODELING IN SOFT TOPOLOGY

Knowledge Engineering in Search Engines

Peer-to-Peer Systems. Chapter General Characteristics

An Efficient Methodology for Image Rich Information Retrieval

Resolving Schema and Value Heterogeneities for XML Web Querying

Improving the Performance of Search Engine With Respect To Content Mining Kr.Jansi, L.Radha

Integration of Heterogeneous Data Sources in Smart Grid based on Summary Schema Model

Aspects of an XML-Based Phraseology Database Application

A Conceptual Design Towards Semantic Geospatial Data Access

STAR Lab Technical Report

Web Service Matchmaking Using Web Search Engine and Machine Learning

Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Perceptions

Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering

Knowledge Retrieval. Franz J. Kurfess. Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A.

Effect of log-based Query Term Expansion on Retrieval Effectiveness in Patent Searching

Wavelet Based Image Retrieval Method

Source Code Search System Using The Knowledge Framework of The Semantic Web. The Graduate School of Science and Technology Kobe University

I&R SYSTEMS ON THE INTERNET/INTRANET CITES AS THE TOOL FOR DISTANCE LEARNING. Andrii Donchenko

Keyword Search over Hybrid XML-Relational Databases

A 120 fps High Frame Rate Real-time Video Encoder

Spatial-Temporal Data Management for a 3D Graphical Information System

An Efficient Semantic Image Retrieval based on Color and Texture Features and Data Mining Techniques

Shuichi Takino*, Shigeaki Okamoto*, Biyu Wan* *Dawn Corporation, Kobe Japan

Schema Repository Database Evolution In

Influence of Word Normalization on Text Classification

Extracting knowledge from Ontology using Jena for Semantic Web

Study on XML-based Heterogeneous Agriculture Database Sharing Platform

A Web Service-Based System for Sharing Distributed XML Data Using Customizable Schema

An Intelligent Retrieval Platform for Distributional Agriculture Science and Technology Data

Chapter 1, Introduction

Modeling and Simulating Discrete Event Systems in Metropolis

Ontology Extraction from Heterogeneous Documents

International Jmynal of Intellectual Advancements and Research in Engineering Computations

Keywords: vectorization, satellite images, interpolation, Spline, zooming

Semantics and Ontologies for Geospatial Information. Dr Kristin Stock

Two-Dimensional Visualization for Internet Resource Discovery. Shih-Hao Li and Peter B. Danzig. University of Southern California

Transcription:

A SPATIAL AND TEMPORAL DATA INTEGRATION METHOD FOR HETEROGENEOUS DATABASE ENVIRONMENTS NAOKI ISHIBASHI YOSHIHIDE HOSOKAWA YASUSHI KIYOKI Graduate School of Media Institute of Information Sciences Faculty of Environmental Information and Governance and Electronics Keio University Keio University University of Tsukuba Fujisawa, Kanagawa 252, Japan Fujisawa, Kanagawa 252, Japan Tsukuba, Ibaraki 305, Japan E-mail: naoki@mag.keio.ac.jp hosokawa@osss.is.tsukuba.ac.jp kiyoki@sfc.keio.ac.jp ABSTRACT By creating an interconnection mechanism among a lot of legacy databases, the values of those databases gain significantly. To interconnect heterogeneous legacy databases with a conventional approach, as computing various inter-relationships like equality, synonymity, similarity or topology, it is necessary to implement static and explicit pattern descriptions which represent various inter-relationships. However, the overhead of the static description is not ignorable specially in the global area network where many legacy databases are connected. There are many systems for computing various relationships among local data items. Those systems have their own application scopes, and an integration framework for those methods is not established. In this paper, we present a data integration method for the meta-level system which interconnects heterogeneous databases by computing spatial and temporal inter-relationships dynamically. This method also integrates heterogeneous computational systems in the metalevel system. The application scope of this method can be expanded through hybridization of several systems for computing relationships among data items. The feature of the method is to realize data integration among heterogeneous databases by computing spatial and temporal relationships context-dependently. Keyword: database integration, multidatabase system, distributed databases, temporal database system, spatial database system 1 INTRODUCTION With the rapid progress of global network and database technologies, a large number of legacy databases are connected to the wide-area network. Those databases have been constructed and accessed independently in the wide-area network environment. By implementing an interconnection mechanism among these legacy databases, the values of the legacy databases gain significantly[2]. Particularly, it is effective to introduce the concept of spatial and temporal database computations to a metalevel of multidatabase environments, because this concept realizes the interconnection among heterogeneous databases according to spatial and temporal contexts. The meta-level of a multidatabase system means an abstracted and higher layer of local databases, and it would be constructed independently to the local systems. The interconnection, according to spatial and temporal contexts, means to join databases by computing spatio-temporal semantics and spatio-temporal relationships which involve contexts defined in specific relationships among local databases. In this paper, we present a system architecture and an implementation method of a multidatabase system which realizes the interconnection by computing spatial and temporal relationships according to spatial and temporal contexts. In a conventional approach for querying and integrating heterogeneous databases, the properties among data values like equality, synonymity, similarity or topology, which are defined as relationships, must be described statically as couples of pattern descriptions. The pattern descriptions are computed by using the pattern-matching technique with pattern descriptions of another database. This process which we call a relationship conversion realizes computational mechanisms for relationships rather than equality in the conventional approach. Currently, a large number of accessible databases are connected to the global network, therefore, an overhead for generating and updating the static descriptions becomes heavy to convert many kinds of relationships among the large number of databases to the single computational mechanism of equality. There are several research activities for converting the relationships to equality among heterogeneous databases automatically or semi-automatically. For example, an evaluation method of schema similarities using neural networks[9] and an evaluation method for equality among data values of heterogeneous databases using ontology[5] have been studied. Our research goals are to provide a system framework for computing the relationships between data values of heterogeneous databases and to realize data integration for heterogeneous databases in the global network environment. There is not a data integration method with the mechanisms for computing many kinds of relationships comprehensively. It means that the conventional data integration systems which compute specific relationships have been designed and implemented independently. Therefore, they have their own application scopes, op-

erations and data formats independently to others. The framework for integrating these data integration systems to share their functionalities and data representations is essentially needed. The systems in [1][10] compute topological, directional and connective relationships among temporal representations in an one-dimensional Euclidean space. The system in [4] computes topological, directional and connective relationships among spatial representations in a three-dimensional Euclidean space. The system presented in [7][8][13] computes semantic equality or similarity in the meanings of words according to contexts. In this system, a 2000-dimensional orthogonal space is defined as the computation space for semantics, and data values and contexts are mapped as vectors on the space. To realize a framework for computing spatial and temporal relationships between data values of heterogeneous databases dynamically, we formalize a hybrid computational method that integrates computational systems for spatial and temporal relationships. This method is designed in a meta-level, which corresponds to an abstracted level of local systems, independently to the implementations of legacy databases. We categorize contexts for the data integration into three types, and our method provides a capability to perform spatial and temporal computations in the context-dependent way by implementing a mechanism which manages and executes queries according to all the three types of contexts. (CX1) The contexts to select a computational system x The context x is used in the meta-level system to select which computational system is required for the query. (CX2) The contexts to select a computational space s The meta-level system uses the context s to select a computational space of a computational system selected with x. For example, both global geographic and local spaces for CAD could be independently defined in the meta-level system of our method, and they would be chosen with s. (CX3) The contexts to compute a relationship in a computational system r The context r represents a context required for computing a relationship between data values on a space selected by s. Some of conventional computational systems have capabilities to recognize various contexts. For example, the method of [7][8][13] determines the meaning of a word according to the given context, and it computes similarity between words with eliminating synonymity which is not required for the query. The contexts r is located as context information dependent to a specific computational system. Our method is used for realizing database integration according to the contexts r of spatial and temporal relationships. In [6], a framework of the hybrid computational method to manage the contexts x and s is formalized. In this paper, we call these relationships which are fixed with r as dynamic relationships, and relationships that are fixed between a couple of data values without any r are defined as static relationships. For computing the dynamic relationships with the conventional approach which only has a computational mechanism for a static relationship such as equality, the conventional approach requires descriptions of the relationship conversions in the same amount of descriptions for r. Particularly, a dynamic relationship between a couple of data values would be described statically as n r equalities where n r is defined as a number of r in the conventional approach. However, the n r could reach almost infinite amount when r takes a continuous range such as a real number, so the proposed data integration mechanism is nearly impossible to be implemented with the conventional approach. For example of r, a spatial relationship A Distance less than(r) B, which represents that a distance between A and B is less than r, includes its meaning with a distance context r. Another example of the dynamic relationships is that A Covers(r) B, which represents that a ratio r of an area B is covered by an area A, includes its meaning with a ratio context r. These spatial relationships should be computed without relationship conversions, because the characteristics of continuous r. In this paper, we present a data integration method for heterogeneous databases with following features. The first feature is that our method interconnects heterogeneous databases dynamically by computing relationships according to user contexts defined in the continuous value domain. This feature contributes to the data integrations among legacy databases. Our method makes it possible to produce more information from legacy databases by the interconnection than the conventional approach. The second feature is to integrate computational systems with the first feature by using the method of [6]. Such hybridization of existing computational methods strengthens their functionalities for interconnecting legacy databases, and its implementation overhead is less than the conventional approach. We have designed computational systems for spatial and temporal relationships with the first feature, and those systems are integrated using the second feature. Such implementations of these computational systems realize a query environment with spatial and temporal contexts to users, and legacy databases are interconnected spatio-temporally. In this paper, we show several experimental results to evaluate that our method reduces the overhead for implementing multidatabase systems in the global network environment. Those experimental results show compar-

Q 1 FT1 M 1 ( L1, L2 ) M FT2 2 ( L 1, L 2,JC) M FT1 3 (A, B) A B result A Query from a User Meta-Level Step-3 M 3 Step-3 Integrating Results of Computational Systems Step-2 M 1 M 2 Step-2 Computing Various Relationships Step-1 Local Level C 1 [1, 1] Local R1 C 2 [2, 1] } L 1 L 2 } C 3 [1, 2] C 4 [2, 2] Local R2 Step-1 Converting Various Data Descriptions Heterogeneous Data Types or Formats Defined in Local Databases Figure 1: Processes of the interconnection isons between the proposed method and the conventional approach in terms of the implementation overhead. Implementation methods of multidatabase systems are categorized in the following two methods in terms of the data integration among heterogeneous databases. Our method is classified into the following Method B. Method A(The conventional approach) which is the data integration method among heterogeneous databases by pattern-matching common data descriptions of meta-level. An implementation method for the global-schema data integrations among heterogeneous databases provides a common schema for legacy databases, and database integrations are performed by patternmatching local data values converted on the common schema[3]. This method for implementing multidatabase systems with the global-schema approach is the database integration method by applying a single database computation mechanism like the ones of RDBMS s to the heterogeneous database environment. Method B(Our method) which is the data integration methods among heterogeneous databases by combining multiple relationship computational systems in meta-level. As an implementation method for data integration with multiple computational systems, ND/SR+[11] and an associative search schema for computing pattern-descriptions and their semantics[13] are presented. ND/SR+ combines a pattern-matching mechanism and the region algebra to integrate structured data, like the ones of RDB, and semistructured data like HTML repositories. The associative search method of [13] integrates a patternmatching mechanism and a semantic associative search mechanism for querying multimedia repositories according to both objective and subjective points of view. As the implementation method for integrating the spatial databases, the common metadata structures and the implementation schema of a standard API for heterogeneous GISs[12] have been presented. 2 THE DATA INTEGRATION METHOD The meta-level functions, which compute spatial and temporal relationships among legacy databases, are categorized as follows: (FT1) function 1 (value 1, value 2 ) boolean (FT2) function 2 (value 1, value 2, JC) boolean FT1 computes a static relationship between value 1 and value 2 which represent data values of legacy databases and returns true if the relationship exists. FT2 computes a dynamic relationship between value 1, value 2 and JC which represents the context as r in the previous section. JC is corresponding to the user s context which stipulates the relationship between value 1 and value 2. To interconnect heterogenous databases with relationships corresponding to FT1 and FT2, a framework for uniformly integrating the computational systems is required. Our data integration method by combining several FT1s and FT2s is formalized by using following four items.

(ME1) The meta-level computational system, M g (g = 1 l) M g formalizes the facility for joining different local databases by combining its FT1 and FT2 functions. It also provides a single computational space structure and some standard data value structures in its computational space. For example, a spatial structure for the spatial computational system is the Euclidean space structure, and representative of their data structures is a point data structure in the Euclidean space. The meta-level functions in this computational system are applied to data values in the corresponding meta-level computational space. (ME2) The local database system, L h (h = 1 m) L h shows the local database system which is connected to the meta-level system. The local database system provides the target data items which are integrated in the meta-level system. (ME3) The query, Q i (i = 1 n) Q i shows a user query including meta-level data integration functions, such as FT1s, FT2s, and/or user s contexts JCs which are used for joining heterogeneous legacy databases. (ME4) The conversion function, C [g,h] j (j = 1 o) To map local data to the corresponding data in M g, the conversion function C [g,h] j is defined between L h and M g. Behavior of C [g,h] j is very complicated in conventional multidatabase systems, because they require conversion functions to translate many kinds of relationships among different databases into their equivalent equality. In our method, C [g,h] j is simply defined in comparison to those in the conventional approach. Fig. 1 shows processes of the data integration between two local databases L 1 and L 2 by combining two FT1s and one FT2. This integration is performed by the following procedures. (Step-1) Transforming and projecting data items in local databases to the standard data representations used in the corresponding computational system in the meta-level system. (Step-2) Joining transformed data items through Step-1 by using FT1 and/or FT2 in each meta-level computational system. (Step-3) Integrating meta-level data items created by Step-2 among meta-level computational systems. Our method provides a facility for adding suitable computational systems for a specific relationship, and the minimum quantity of required C [g,h] j is in O(l m) where Function Interpreter 1 DBMS 1 DB 1-1 DB 1-2 Local Database System 1 Meta-Level Query Processor Meta-Level System Meta-Level Function Spatial Interpreter 1 Computation System Meta-Level Function Interpreter 2 Meta-Level Function Interpreter 3 Temporal Computation System Equality Computation System Function Interpreter 2 : Meta-Level Functions for Spatial Computations : Meta-Level Functions for Temporal Computations DBMS 2 : Meta-Level Functions for Equality Computations : Meta-Level Functions for Local Database System 1 DB 2 : Meta-Level Functions for Local Database System 2 Local Database System 2 Figure 2: An architecture of the multidatabase system l represents a number of the computational systems, and m represents a number of local databases. Since a computational system contains at least one or more metalevel functions, l is always less than t where t represents the number of relationships. A conventional multidatabase system is described as the system which has a single M g and a complicated set of C [g,h] j as the static descriptions of required relationships. In this case, a minimum quantity of required C [g,h] j for static relationships is in O(t m C 2 ) where t represents the number of relationships and m represents the number of databases. For the dynamic relationships, the minimum quantity of C [g,h] j is in O(t u m C 2 ) where u represents the number of JC if it is countable, and this overhead is extremely critical while interconnecting a lot of databases in the global network with many relationships. By using our data integration method, the quantity of static description is extremely reduced by implementing various meta-level functions which evaluate relationships between data items directly. 3 IMPLEMENTATIONS To implement our data integration method, we have developed a multidatabase system shown in Figure 2. The meta-level system consists of a meta-level query processor, meta-level computational systems and function interpreters. The meta-level query processor interprets queries given by users to the meta-level system. The meta-level computational systems compute relationships among data values of local systems. The metalevel function interpreters translate meta-level primitive functions to corresponding operations of the meta-level computation systems. The function interpreters for local database systems convert the meta-level operations to the corresponding local operations. The function interpreters also act as the conversion functions C [g,h] j, so they translate query results to meta-level data representations. The meta-level system contains following three computational systems: (M 1 ) Spatial Computation System

To interconnect local databases spatially, we have defined the Egenhofer s model of spatial relationships[4] as M 1, and implemented it in the meta-level system. The system of M 1 includes 72 FT1s and 3 FT2s. (M 2 ) Temporal Computation System For temporal interconnections, the Allen s temporal interval model[1] is defined as M 2, and implemented as the computational system of the meta-level system. The system of M 2 has 14 FT1s and 3 FT2s. (M 3 ) Equality Computation System We have implemented a computational system for equality as M 3, and M 3 has only a single FT1. 4 EXPERIMENTS We have performed experiments to clarify availability of our meta-level system in the viewpoint of the system implementation. The experimental environment has been designed to show differences in implementation overheads between the proposed system and the conventional system in spatial and temporal query contexts. The implementation overheads of the two systems are defined by the amount of C [g,h] j which is defined as follows: 1. C [g,h] j is defined to specify the semantics of legacy databases as the meta-level representations. 2. The total amount of C [g,h] j, for a meta-level system, depends on the number of legacy databases connected to the meta-level, so it shows the implementation overheads. 4.1 EXPERIMENT METHOD We have compared the implementation overhead of the proposed system and the overhead of the conventional system with fixed accuracy for their query processing. The accuracy is defined by the following two items P recision= Recall= A number of correct answers in matched results A number of matched results A number of correct answers in matched results A number of correct Answers We prepared the results of mathematical procedures of [1][4] for spatial and temporal data integrations. 4.2 SYSTEM ENVIRONMENT (1) Local databases As shown in Table 1, we have constructed the following local databases which contain spatial and temporal attributes. (L 1 ) A museum database with 100 tuples (PostgreSQL V6.5.3) (L 2 ) A train database with 6,210 tuples (UniSQL V3.5.3.2J) (2) Queries The following queries are committed among legacy databases: (Q 1[r] ) Link museums and trains with the following condition where JC r continuously varies in a range between 5 to 95: 1. A train is arriving at the station JC r minutes or less before a museum opens. (Q 2[q] ) Link museums and trains with the following condition where JC q continuously varies in a range between 5 to 95: 1. A distance between a museum and a station is less than JC q. (Q 3[q,r] ) Link museums and trains with the following conditions where JC q and JC r continuously varies in a range between 5 to 95: 1. A distance between a museum and a station is less than JC q. 2. A train is arriving at the station JC r minutes or less before the museum opens. (3) Meta-level systems and conversion functions We have implemented the different multidatabase environments to compare our implementation method and the conventional approach. Two multidatabase environments are differently implemented as follows: Method A: A meta-level system of Method A is designed for a representative of a conventional system as follows: (MLS1) A conventional meta-level system As a representative of conventional methods, MLS1 has M 3 for computing any group of relationships. Therefore, it requires a large number of relationship conversions for the spatial and temporal queries listed above. A large number of relationship conversions are required for the best performance to the queries. Here, the following 8 relationship conversions are statically described and implemented for spatial and temporal computations. MLS1 uses the combinations of the relationship conversions. For temporal relationship conversions, local data values of L 1 and L 2 have been converted to common pattern descriptions RC1 RC4, if the couples of data values are satisfied with the following temporal relationships:

Table 1: Local Databases a: Local database 1(L 1 ) b: Local database 2(L 2 ) dataid name time address dataid name trainid direction station arrival departure 0 M000 09:30-19:30 lmpr 1 M001 11:30-19:30 chjm 2 M002 10:00-17:00 gbmo : : : : 97 M097 09:00-19:30 ceph 98 M098 10:00-16:30 zsji 99 M099 12:30-16:30 ohil 0 T1 T1UP01 UP S1 05:22 05:25 1 T1 T1UP01 UP S2 05:32 05:35 2 T1 T1UP01 UP S3 05:42 05:45 : : : : : : : 6207 T4 T4DN88 DN S14 23:43 23:46 6208 T4 T4DN88 DN S24 23:53 23:56 6209 T4 T4DN88 DN S25 24:03 24:06 Table 2: Meta-level functions required for the experiments Q i M g Type Function Description Definition Q 1[r] M 2 FT1 tt1 before(s, tv 1, tv 2) boolean tv 1 is before tv 2 M 2 FT1 tt1 meets(s, tv 1, tv 2 ) boolean tv 1 meets tv 2 M 2 FT2 td1 lt(s, tv 1, tv 2, td) boolean distance between tv 1 and tv 2 is less than td M 3 FT1 sp equal(s, s1, s2) boolean s1 is equal to s2 Q 2[q] M 1 FT2 gnd1 lt(s, gv 1, gv 2, gd) boolean distance between gv 1 and gv 2 is less than gd M 3 FT1 sp equal(s, s1, s2) boolean s1 is equal to s2 Q 3[q,r] M 1 FT2 gnd1 lt(s, gv 1, gv 2, gd) boolean distance between gv 1 and gv 2 is less than gd M 2 FT1 tt1 before(s, tv 1, tv 2) boolean tv 1 is before tv 2 M 2 FT1 tt1 meets(s, tv 1, tv 2 ) boolean tv 1 meets tv 2 M 2 FT2 td1 lt(s, tv 1, tv 2, td) boolean distance between tv 1 and tv 2 is less than td M 3 FT1 sp equal(s, s1, s2) boolean s1 is equal to s2 (RC1) (C [3,1] t 1 t 1 ): A train arrives at a station at the same time when a museum opens. (RC2) (C [3,1] t 2 t 2 ): A train arrives at a station 30 minutes or less before a museum opens. (RC3) (C [3,1] t 3 t 3 ): A train arrives at a station 60 minutes or less before a museum opens. (RC4) (C [3,1] t 4 t 4 ): A train arrives at a station 90 minutes or less before a museum opens. Local data values of L 1 and L 2 have been converted to common pattern descriptions, RC5 RC8, for spatial relationship conversions if the couples of data values are satisfied with the following conditions: (RC5) (C [3,1] s 1 ): A train is arriving at a station where its location is the same as a museum. s 1 (RC6) (C s [3,1] 2 ),(C s [3,2] 2 ): A distance between a museum and a station, where a train arrives, is less than 30. (RC7) (C s [3,1] 3 ),(C s [3,2] 3 ): A distance between a museum and a station, where a train arrives, is less than 60. (RC8) (C s [3,1] 4 ),(C s [3,2] 4 ): A distance between a museum and a station, where a train arrives, is less than 90. Method B: A meta-level system of Method B, which represents our multidatabase model as follows: (MLS2) Our meta-level system Our method provides the framework to compute relationships among legacy databases with multiple computational systems. Therefore, MLS2 is able to contain all M 1, M 2 and M 3. Because of the implementations of M 1 and M 2 in a meta-level system, the spatial and temporal relationships are directly computed without any relationship conversion in MLS2. MLS2 uses the meta-level functions shown in Table 2 for the spatial and temporal computations. The following conversion functions have been constructed for Method B: (RC9) (C [2,1] 1 ): RC9 converts data values of L 1 to temporal data representations. That is, temporal intervals while a museum opens are mapped onto the one-dimensional space representation. (RC10) (C [2,2] 1 ): RC10 converts data values of L 2 to temporal intervals that represent the period of time while a train stays in a station. (RC11) (C [1,1] 1 ): RC11 converts data values of L 1 to the spatial data representation. That is, the location of the museum is mapped as a point on the two-dimensional space. (RC12) (C [1,2] 1 ): RC12 converts data values of L 2 to the spatial points of the locations of each station.

Precision and Recall 1 0.9 0.8 0.7 Precision(RC1) Precision(RC2) Precision(RC3) Precision(RC4) Recall(RC4) The scope of computational contexts Precision and Recall 1 0.9 0.8 0.7 Precision(RC5) Precision(RC6) Precision(RC7) Precision(RC8) Recall(RC8) The scope of computational contexts Recall(RC2) Recall(RC3) Precision(RC1) Recall(RC1) Precision(RC2) Recall(RC2) Precision(RC3) Recall(RC3) Precision(RC4) Recall(RC4) Recall(RC7) Recall(RC6) Precision(RC5) Recall(RC5) Precision(RC6) Recall(RC6) Precision(RC7) Recall(RC7) Precision(RC8) Recall(RC8) Recall(RC1) Recall(RC5) 5 11 17 23 29 35 41 47 53 59 65 71 77 83 89 95 0 5 11 17 23 29 35 41 47 53 59 65 71 77 83 89 95 0 Temporal Distance Contexts Spatial Distance Contexts Figure 3: Results of MLS1: Q 1[r] Figure 4: Results of MLS1: Q 2[q] Table 3: Computable contexts in each system: Q 1[r] Precision and Recall Method A Method B 0.7 0.7826 1.0000 0.8 0.6521 1.0000 0.9 0.3695 1.0000 = 1.0 0.0652 1.0000 Table 4: Computable contexts in each system: Q 2[q] Precision and Recall Method A Method B 0.7 0.5869 1.0000 0.8 0.4130 1.0000 0.9 0.2173 1.0000 = 1.0 0.0434 1.0000 4.3 EXPERIMENTAL RESULTS Figure 3 shows the computable scope of temporal contexts for Q 1[r] by the query processing of MLS1. We assumed the situation that users require slightly weaker limitation for the query accuracy, such as 0.8 for both precision and recall. In the figure 3, thick bars painted in gray show the scope of contexts in the condition where both precision and recall are more than 0.8. Under the same condition, MLS1 was computable for 65.22% of all temporal contexts as shown in Table 3. Figure 4 shows the computable scope of spatial contexts for Q 2[q] by the query processing of MLS1. The MLS1 was computable for 41.30% of all spatial contexts under the condition where both precision and recall are more than 0.8, as shown in Table 4. To fix the computable scope of the MLS1, we have described 108,515 rules for the temporal relationship conversions RC1 RC4, and 85,369 rules for the spatial relationship conversions RC5 RC8. The precisions and recalls of the MLS2 are always equal to 1 for all contexts as presented in Tables 3 and 4, because the query results of MLS2 are computed with the mathematical procedures which are compute the correct answers. We have described 6,310 data conversion rules for each spatial and temporal relationship. By those results and the features of MLS2 that does not require any relationship conversion for the queries, the implementation overhead of MLS2 was less than 1/13 of that of MLS1 to keep specific accuracy on either a sin- gle spatial context or a single temporal context. Therefore, the implementation overhead of MLS2 was drastically less than that of MLS1. Figure 5 shows the scope of computable spatial and temporal contexts for Q 3[q,r] by the query processing of MLS1. The painted regions of the figure show the scope of computable contexts under a condition where both precision and recall are more than 0.8. Figure 5 shows that the computable scope of MLS1 was spread on 27.27% of all spatial and temporal contexts as shown in Table 5. To fix the computable scope of the MLS1, we described 193,887 rules for the temporal relationship conversions RC1 RC4, and 12,620 rules for the spatial relationship conversions RC5 RC8. By those results, the implementation overhead of MLS2 was less than 1/15 of that of MLS1 to keep specific accuracy on the spatial and temporal contexts. The implementation overhead of the meta-level system for computing composite contexts increased in comparison to that for computing a simple context. In multidatabase environments, a framework to combine several computational mechanisms which compute relationships dynamically is needed, because those relationships are not statically described like a schema within a single database. In the conventional approach, large amount of static descriptions are required to integrate heterogeneous databases by computing a lot of relationships and contexts by using a single computational mechanism of

Spatial Distance Contexts 95 85 75 65 55 45 35 25 15 5 5 15 25 35 45 55 65 75 85 95 Temporal Distance Contexts Figure 5: Results of MLS1: Q 3[q,r] Table 5: Computable contexts in each system: Q 3[q,r] Precision and Recall Method A Method B 0.7 0.5548 1.0000 0.8 0.2726 1.0000 0.9 0.0921 1.0000 = 1.0 0.0160 1.0000 pattern- matching. Therefore, we clarified availability of our method in the viewpoint of effectiveness and accuracy. 5 CONCLUSION In this paper, we have presented the data integration method for heterogeneous legacy databases by combining equality, similarity, synonymity, topological relationships, directional relationships and distance relationships for spatial and temporal data. The feature of the system is to provide computational mechanisms of both static and dynamic relationships, and they are integrated in the meta-level architecture. The differences between the conventional approach and our method have been examined through experiments in terms of the implementation overhead of each system, and our method is observed to reduce the implementation overhead through the experimental study. As the future research, we will design and implement a computational system for data mining in the metalevel system and an active database mechanism which deals with the computations for spatial and temporal relationships among heterogeneous databases. References [1] J.F. Allen, Maintaining Knowledge about Temporal Intervals, Communications of the ACM, Vol.26, 1983, 832-843. [2] M.W. Bright, A.R. Hurson and S. Pakzad, A Taxonomy and Current Issues in Multidatabase Systems, Computer, 25(3), 1992, 50-60. [3] C. Batini, M. Lenzerini and S.B. Navathe, A comparative analysis of methodologies for database schema integration, ACM Computing Sueveys, 18(4), 1986, 324-364. [4] M.J. Egenhofer, Spatial Relations: Models, Inferences, and their Future Application, Proceedings of Advanced Database Symposium, Tokyo, Japan, December 2-4, 1996, separate volume. [5] A. Goñi, E. Mena and A. Illarramendi, Querying Heterogeneous and Distributed Data Repositories using Ontologies, Information Modelling and Knowledge Bases IX, IOS Press, 1998, 19-34. [6] Y. Hosokawa, N. Ishibashi, Y. Yashiro and Y. Kiyoki, A Data Integration Method Realizing Evaluation for Temporal and Spatial Relationships in a Multidatabase Environment, Information Processing Society of Japan Transactions on Databases, Vol.40, No.SIG 8(TOD4), 1999, 95-111. [7] T. Kitagawa and Y. Kiyoki, The mathematical model of meaning and its application to multidatabase systems, Proc. 3rd IEEE Int. Workshop on Research Issues on Data Engineering: Interoperability in Multidatabase Systems, 1993, 130-135. [8] Y. Kiyoki, T. Kitagawa and Y. Hitomi, A fundamental framework for realizing semantic interoperability in a multidatabase environment, Journal of Integrated Computer-Aided Engineering, 2(1), 1995, 3-20. [9] W.S. Li and C. Clifton, Semantic Integration in Heterogeneous Database Using Neural Networks, Proceedings of the 20th VLDB Conference, 1994, 1-12. [10] Y. Masunaga, A temporal expansion to the multimedia object model in OMEGA, Proc. of DASFAA 95, 1995, 430-440. [11] A. Morishima and H. Kitagawa, A Data Modeling and Query Processing Schema for Integration of Structured Document Repositories and Relational Databases, Proceedings of the 5th International Conference on Database Systems for Advanced Applications (DASFAA 97), 1997, 145-154. [12] Open GIS Consortium, The OpenGIS T M Guide, Introduction to Interoperable Geoprocessing, Part 1, 1996, Available via WWW from http:// www.opengis.org/public/. [13] N. Yoshida, Y. Kiyoki and T. Kitagawa, An Associative Search Method Based on Symbolic Filtering and Semantic Ordering for Database Systems, Proc. of 7th IFIP 2.6 Working Conference on Database Semantics (DS-7), 1997, 215-237.