SEDRIS Transmittal Storing and Retrieval System using Relational Databases

38 Journal of Database Management, 25(4), 38-65, October-December 2014 SEDRIS Transmittal Storing and Retrieval System using Relational Databases Yongkwon Kim, The School of Computing, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea Heejung Yang, The School of Computing, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea Chin-Wan Chung, The School of Computing, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea & The Chongqing Liangjang KAIST International Program, Chongqing University of Technology (CQUT), Chongqing, China ABSTRACT Modeling and simulation (M&S) are widely used for design, analysis, and optimization of complex systems and natural phenomena in various areas such as the defense industry and the weather system. In many cases, the environment is a key part of complex systems and natural phenomena. It includes physical aspects of the real world which provide the context for a specific simulation. Recently, several simulation systems are integrated to work together when they have needs for exchanging information. Interoperability of heterogeneous simulations depends heavily on sharing complex environmental data in a consistent and complete manner. SEDRIS (Synthetic Environmental Data Representation and Interchange Specification) is an ISO standard for representation and interchange of environmental data and widely adopted in M&S area. As the size of the simulation increases, the size of the environmental data which should be exchanged between simulations increases. Therefore, an efficient management of the environmental data is very important. In this paper, the authors propose storing and retrieval methods of SEDRIS transmittals using a relational database system in order to be able to retrieve data efficiently in the environmental data server cooperating with many heterogeneous distributed simulations. By analyzing the structure and the content of SEDRIS transmittals, relational database schemas are designed. To reduce query processing time of SEDRIS transmittals, direct storing and retrieval methods which do not require the type conversion of SEDRIS transmittals are proposed. Experimental analyses are conducted to show the efficiency of the proposed approach. The results confirm that the proposed approach greatly reduces the storing time and retrieval time compared to comparison approaches. Keywords: Relational Databases, SEDRIS, Simulation, STF, XML DOI: 10.4018/JDM.2014100103

Journal of Database Management, 25(4), 38-65, October-December 2014 39 1. INTRODUCTION With the rapid development of computing technologies, modeling and simulation (M&S) has become an integral part of the modern research and development process (Günal & Pidd, 2011; Stopford & Counsell, 2008). A common representation of the physical environment is a critical element in M&S and is a necessary precondition for the interoperability of heterogeneous simulations (Sedris, 1994). Recently, integration of heterogeneous simulations increases to work together with different simulations. Especially, in the defense industry, integration of simulations is widely used. For example, flight simulations for training several pilots and a simulation for command-level training where officers are trained to manage complex situations and command thousands of simulated participants can be integrated (Moller, 2012). Interoperability of heterogeneous simulations depends heavily on sharing complex environmental data in a consistent and complete manner (Ryu, 2009). The environmental data of distributed simulations is managed by a separate environmental data server that is connected in a network. The environmental data server stores the environmental data before other simulations are conducted, and the environmental data in the environmental data server is not changed after the data is stored. The stored data is utilized as the reference data of a specific environmental condition. A simulation in a network sends a request to the environmental data server in order to get some environmental data of a specific location with a specific condition. Between the environmental data server and other simulations, the fast data retrieval is essential. If the data retrieval is not fast enough, receiving the environmental data can become a bottleneck of simulations. Thus, for the environmental data server, the performance of the storing side is not important as compared with that of the retrieval side. The Synthetic Environmental Data Representation and Interchange Specification (SEDRIS) is an ISO standard for representation and interchange of environmental data such as ocean, terrain, atmosphere, and space, etc. SEDRIS is based on five core technology components which are the SEDRIS Data Representation Model (DRM), the Environmental Data Coding Specification (EDCS), the Spatial Reference Model (SRM), the SEDRIS interface specification (SEDRIS API), and the SEDRIS Transmittal Format (STF). The DRM, the EDCS, and the SRM are used to achieve an unambiguous representation of environmental data. The SEDRIS API and the STF allow an efficient sharing and interchange of the environmental data represented by the other three components. The STF is a conceptual file format that is developed to store SEDRIS transmittals. A SEDRIS transmittal is composed of one or more files containing environmental data. The STF provides a platform-independent interchange mechanism for SEDRIS transmittals. As the simulation becomes more complex, the size of the STF file increases. For example, an infantry simulation requires only atmospheric

40 Journal of Database Management, 25(4), 38-65, October-December 2014 data on the ground. However, if a fight simulation cooperates with the infantry simulation, the STF file for these simulations should include atmospheric data in the sky over infantry as well as environmental data on the ground. As the size of an STF file about the environmental data increases, finding the environmental data with a specific condition becomes more difficult because there is no way to divide an existing STF file into many sub-files. A huge STF file should be accessed although a simulation in a network wants just one value. For example, assume that there is a tank simulation and it wants the temperature, the air pressure, and the humidity of the simulated tank s current location in the rain. The location of the simulated tank changes as the simulated tank moves, and its moving direction cannot be known to the environmental data server. Thus, the environmental data server should send the atmospheric data corresponding to the current location of the simulated tank. The atmospheric data of some region in the rain is usually stored in an STF file together because it is impossible to store the atmospheric data of every location in separate STF files, and the entire STF file should be scanned to find a value whenever atmospheric data is requested. Thus, the size of the STF file increases as the size of the environmental data increases, so it causes a high overhead to access the STF file instead of some values. Therefore, keeping SEDRIS transmittals in a file system has a disadvantage in accessing data. In a file system, it is very difficult to find the data which is needed for some specific purpose, and thus convenient and efficient data retrieval of specific data is almost impossible. A relational database is a collection of data items organized as a set of formally described tables from which data can be accessed easily. The standard user and application program interface to a relational database is the structured query language (SQL). SQL statements are used both for interactive queries for information from a relational database and for gathering data for reports. In addition to being relatively easy to create and access, a relational database has the important advantage of being easy to extend. When creating a relational database, the domain of possible values in a data column and constraints that may apply to that data value can be defined. Therefore, storing SEDRIS transmittals in a relational database provides a convenient and easy way to manage data. There is a previous approach using a database to store SEDRIS transmittals (Sekar & Lee, 2004). It used the XML format as a common format to build a geospatial data repository. The Fused Data Metamodel (FDM) is defined to represent vector and raster geospatial data along with correlations among those data. The repository is designed as a client-server architecture using a database. FDM is represented using XML schemas and the resulting XML datasets of geospatial data are stored in a database. However, this approach is time-consuming because it requires the XML conversion before integrating geospatial data into a database system. In addition, they just suggested an idea of storing geospatial data in a database system. The details and the implementation were left as a future work.

Journal of Database Management, 25(4), 38-65, October-December 2014 41 In this paper, we propose storing and retrieval methods of SEDRIS transmittals using a relational database system in order to be able to retrieve data efficiently in the environmental data server cooperating with many heterogeneous distributed simulations. Atmospheric data is considered as an input data. To reduce the overhead required for XML conversion of the previous approach, SEDRIS transmittals of atmospheric data are directly stored in a relational database by dividing the metadata and the data. Relational database schemas for the metadata and the data are defined by examining the structure and the content of SEDRIS transmittals of atmospheric data. Data storing and retrieval algorithms are devised for efficient query processing of SEDRIS transmittals of atmospheric data. The contributions of this work are summarized as follows: We propose storing and retrieval methods of SEDRIS transmittals using a relational database system for fast data retrieval without the type conversion of STF. For the environmental data server whose data is not changed after the data is stored, the fast data retrieval is much more important than the efficient data storing; Relational database schemas of SEDRIS transmittals are designed by analyzing the structure and the content of SEDRIS transmittals of atmospheric data; A storing and retrieval system of SEDRIS transmittals using a relational database system is implemented using SEDRIS API; Experimental analyses are conducted to show the fast data retrieval of the proposed approach. We compared our approach with two comparison approaches. On the average, the storing time of our approach is approximately 20 times slower while the retrieval time is approximately 19000 times faster than those of the first comparison approach, respectively. On the average, the storing time of our approach is approximately 1.8 times faster while the retrieval time is approximately 5.8 times faster than the second comparison approach, respectively. 2. RELATED WORK Enabling interoperability between M&S systems, and allowing coherent data integration from multiple sources, requires a consistent and common approach to handling and converting the data. SEDRIS facilitates the transmission and reuse of environmental data among heterogeneous systems through a standard data representation model and interchange mechanism. The environment is represented by capturing all data elements and their relationships. A standard interchange mechanism and format are provided to support the distribution of environmental data and to promote the sharing of environmental data among heterogeneous systems.

42 Journal of Database Management, 25(4), 38-65, October-December 2014 SEDRIS is used as a global standard data model for synthetic environmental data of simulations since it was developed. SEDRIS is used broadly among the DoD (Department of Defense) Components and M&S User Communities, and most environmental data consumers are SEDRIS users 1. Cox et al. (2002) develop a simulation capability of ports and harbors using SEDRIS. Hembree et al. (2001) devise a framework to store the atmospheric gridded data using SEDRIS. Macchi and Sims (2002) devise a mapping between the Humanoid Animation (H-Anim 2 ) and SEDRIS. Additionally, Campos and Hull (2004) develop the Transmittal Content Requirements Specification (TCRS) used to conduct the SEDRIS data verification for other applications, and Suliman (1999) develop a tool for visualizing SEDRIS data through the web. SEDRIS can be used for producing data repositories because it fully provides conversion mechanisms between heterogeneous systems. Skowronski (1998, 1999) produce a terrain database for computer generated forces using SEDRIS. Schaefer and De La Cruz (2002) produce a web-based repository for synthetic natural environmental objects using SEDRIS. However, there are a few research papers because SEDRIS is mainly used in the military domain, especially defense modeling & simulation. Thus, there are a few research papers about SEDRIS. Li et al. (2004) present an interoperability framework of complex distributed simulation systems in order to use together C4ISR systems which are real training systems, and simulations which use HLA/ RTI in order to communicate each other. SEDRIS provides a method to express environmental data in their work. Campos et al. (2005) describe the future combat system and used SEDRIS to build the training environment. However, the purposes of the above works are different from that of our research. The database community and SEDRIS users are not familiar with each other. Therefore, only one attempt to manage SEDRIS transmittals using databases exists. To the best of our knowledge, there is only one work for storing SEDRIS transmittals into a database. Sekar and Lee (2004) propose a framework for creating a geospatial data repository which can store source data along with modified data and correlations among data elements. Geospatial data can be represented by different data format defined in the standards such as OpenGIS, SEDRIS, or Geography Markup Language (GML). A unified data model called Fused Data Metamodel (FDM) is defined to integrate geospatial data along with correlations among those data represented by the above diverse data format. FDM is represented by XML format and the resulting XML datasets can be stored into relational, object-oriented, object-relational, or native-xml database management system. Therefore, during the integration, SEDRIS transmittals along with OpenGIS data and GML documents are transformed to XML documents. However, they just show the possibility of storing geospatial data in a database management system. The details and the implementation are remained as a future work. In addition,

Journal of Database Management, 25(4), 38-65, October-December 2014 43 this approach can be time-consuming because it requires XML conversion before integrating geospatial data into a database. This approach is used as a comparison approach. In order to store SEDRIS transmittals to a database using the method proposed by Sekar et al., SEDRIS transmittals should be converted to XML documents and converted XML documents are stored to a database. Since a relational database is used to store SEDRIS transmittals in our approach, we implement the comparison approach using a relational database. Bhatt et al. (2004) propose sedonto (Synthetic Environment Data Representation Ontology), which is an ontology to be used within the M&S domain for the representation of data pertaining to a synthetic environment. sedonto is based on the SEDRIS DRM, which is a UML based specification of the various synthetic environment representation classes and their relationships. The Web Ontology Language (OWL), which provides a Web-based formalism for representing taxonomy/domain hierarchies, is used for representing sedonto. There are many approaches to store OWL documents to a relational database. Therefore, after converting SEDRIS transmittals to OWL documents, SEDRIS transmittals can be stored to a relational database. However, in this work, they do not consider storing of SEDRIS transmittals into a relational database. 2.1. XML Databases For decades, many researches about developing efficient XML database systems have been conducted. Those researches are categorized into two classes such as the native approaches and the relational approaches. The relational approaches use existing relational database systems to store XML documents and process queries on XML documents. The basic edge approaches and node approaches cause many joins to make the query processing performance worse. Yoshikawa et al. (2001) propose the Path Materialization (PM) approach that stores paths of elements instead of names of elements. Pal et al. (2004) propose a Reverse-Path (RP) approach that stores reversed paths of elements in order to avoid incorrect query answers when recursion exists in XML data. Krishnamurthy et al. (2003) use the DTD of XML documents in order to design table schemas of the relational database. The relational approaches can use existing components of relational databases, however the cost of structural joins is high. The native approaches are storage and query processing approaches specialized for XML documents. XM documents are converted to the inverted lists, sequences, or trees to efficiently process queries on XML documents. The work of Moro et al. (2005) converts XML documents to inverted lists to efficiently process the structural joins. The work of Wang and Meng (2005) converts XML documents to sequences in order to avoid the expensive joins. The query processing is converted to the subsequence matching. The work of Zhang et al. (2004) converts XML documents to trees in the streaming context. Streaming XML documents have to be

44 Journal of Database Management, 25(4), 38-65, October-December 2014 traversed in the depth-first order. The native approaches can process the structural joins efficiently, however other components of DBMS such as concurrency control, storage management, and so on, should be improved. With the massive growth of XML for data interoperability purposes in various domains, much research has been conducted on different aspects of XML. Moro et al. (2009) summarize some of the research topics on XML by presenting some of the most relevant and traditional researches on XML databases, XML query processing, XML views, XML data matching, and XML schema evolution. Hachicha and Darmont (2013) provide a comprehensive survey of XML tree patterns that are considered crucial in XML querying and its optimization. In the performance comparison, a scheme using relational approaches and a scheme using native approaches are compared with the proposed scheme. We use a widely used relational DBMS instead of the native XML database because the native XML database space is not mature as much as the relational database space 3. In addition, most of the commercial DBMSs support the XML format as one of the fundamental data types 4. According to works of Nicola and Linden (2005) and Rys (2005), commercial DBMSs also provide native XML storage and query support. In the work of Kurt and Atay (2002), the query processing in the XML-enabled database was multiple times faster than the native-xml database (Kurt & Atay, 2002, p.15). 3. PROPOSED APPROACH Nowadays, many simulations cooperate for one goal in an M&S area. As the size of the simulation increases, the size of the environmental data which should be exchanged among simulations becomes larger. Thus, simulations should use databases to manage environmental data. In this paper, storing and retrieval methods of SEDRIS transmittals using a relational database system are proposed in order to be able to retrieve data efficiently in the environmental data server cooperating with many heterogeneous distributed simulations. This research is conducted under the assumption that many simulations cooperate for the same goal, and they are connected in a network. When a simulation wants to obtain environmental data, it issues a query to another simulation which has the desired data. The result of the query is returned as a SEDRIS transmittal. In this section, the storing and retrieval methods of SEDRIS transmittals are explained in detail. 3.1. The Dataset Used SEDRIS transmittals are generated based on its own STF (SEDRIS Transmittal Format), and the STF consists of classes and relationships between classes de-

Journal of Database Management, 25(4), 38-65, October-December 2014 45 fined in the DRM (SEDRIS Data Representation Model). Users can design their own STF to store its environmental data using DRM classes as they want. Thus, to devise storing and retrieval methods for all kinds of the STF is too difficult to conduct in this research. Therefore, we focus on a specific STF which is generated in Kim et al. (2010). The dataset used in this paper is a kind of atmospheric data which is collected in and around Jeju island of South Korea. The region covering Jeju island and the adjacent sea of Jeju island are partitioned into a grid of many cells. For each cell, three kinds of attributes - the precipitation, the wind direction and the wind speed - are collected one time an hour for six hours. Three types of the size of each cell are used to gather data. The maximum sizes of each cell are 600m, 600m, and 300m along the x-axis, the y-axis and the z-axis, respectively. The minimum sizes of each cell along each axis are 200m, 200m, and 100m, respectively. 3.1.1. DRM Structure of the Dataset Used Figure 1 shows the DRM diagram of the STF of SEDRIS transmittals used in this paper. The DRM diagram plays a role as a schema of SEDRIS transmittals. SEDRIS transmittals used in this paper consist of a metadata part and a table data part. A metadata part represents characteristics of the SEDRIS transmittal such as who makes this SEDRIS transmittal, when the SEDRIS transmittal is generated, what the summary of the SEDRIS transmittal is, and so on. Figure 1. DRM diagram of the dataset used

46 Journal of Database Management, 25(4), 38-65, October-December 2014 A table data part is a <Time Related Geometry> class. A table data part contains the data array and the metadata of the data array, which is different from the metadata of the SEDRIS transmittal. The data array is stored in <TPD> instances. The metadata of the data array describes information of the data array using DRM classes in the table data part in Figure 1. <Time Constraints Data> denotes when the data in <Property Grid Hook Point> is collected. <Property Grid Hook Point> contains the spatial data. An instance of <Property Grid Hook Point> is used to include a <Property Grid> instance and a <CD 3D Location> instance in order to store the spatial data with the grid. <Absolute Time> presents the time when the data is collected. An instance of <Time Constraints Data> and an instance of <Property Grid Hook Point> are associated with each other. Each SEDRIS transmittal used has 6 pairs of instances of <Property Grid Hook Point> and those of <Time Constraints Data>, and all data in the same grid are collected at the same time. <CD 3D Location> denotes where the data in <Property Grid> is collected. <Property Grid> contains the grid information of the region where the data is collected, the <TPD> of the data table and the data array. <Classification Data> presents the type of data in <Property Grid>. <TPD> means <Table Property Description> which describes the attribute stored in the SEDRIS transmittal. In this paper, three instances of <TPD>, such as the precipitation, the wind direction and the wind speed, are used in each SEDRIS transmittal. Each <TPD> instance has a data array with collected data values. <Regular Axis> defines the grid of the region. In each SEDRIS transmittal used, there are three instances (X, Y, Z) of <Regular Axis> class. <Regular Axis> contains the first value, the spacing, and the axis value count. The first value specifies the first numeric value on the axis. The spacing specifies the distance between adjacent values on the axis. It is related to the size of each cell. The axis value count specifies the number of values on the axis. It means the number of cells on the axis. 3.2. Storing Method In this section, the storing method of the SEDRIS transmittal into the relational database system is explained. Figure 2 shows the storing method of the SEDRIS transmittal to the relational database. In order to store the SEDRIS transmittal, two tables in the relational database are required, one is the metadata table and the other is the data table. In the metadata table, the exoskeleton file of the SEDRIS transmittal is stored. The exoskeleton file contains the metadata of the SEDRIS transmittal, and it is explained in detail in Section 3.2.1. The data table stores the data array of the SEDRIS transmittal. One data table stores data of only one SEDRIS transmittal while the metadata table stores a set of the metadata of all SEDRIS transmittals in the database. Thus, if the new SEDRIS transmittal is inserted into a relation database, the number of rows of the metadata table increases by one and the new data table for the new SEDRIS transmittal is created.

Journal of Database Management, 25(4), 38-65, October-December 2014 47 Figure 2. The storing method of the SEDRIS transmittal to the relational DB 3.2.1. The Exoskeleton File The SEDRIS transmittal consists of the table data and the metadata which explain the SEDRIS transmittal. The metadata of the data array is also contained in the table data part. Figure 3 shows the structure of the SEDRIS transmittal. The metadata of the SEDRIS transmittal and that of the data array seem not to be necessary for query processing. However, they should be stored in the database because the query result is returned as a SEDRIS transmittal. A SEDIRS transmittal cannot be formed without the metadata. The metadata of the SEDRIS transmittal is an essential part of the query result. However, to store it in a database table in a unit of values is very inefficient Figure 3. The structure of the SEDRIS transmittal

48 Journal of Database Management, 25(4), 38-65, October-December 2014 because of the following reasons. First, it consists of lots of simple values in the complex structure. Therefore, parsing and managing the metadata are very costly. Second, the metadata is not modified depending on the query condition. It means that the metadata is exactly the same as the metadata of one of any query results in the type of a SEDRIS transmittal. In the case of the metadata of the data array, it is different from that of the metadata of the SEDRIS transmittal. All the queries are executed in the table data part, and the size of data in the query result is different from that of the whole data in the original SEDRIS transmittal. It means that the metadata of the data array in the query result transmittal should be modified depending on the query conditions. For the convenient management of the database, the exoskeleton file is proposed. The exoskeleton file is designed to store all the metadata in the SEDRIS transmittal. It means that the exoskeleton file contains all the contents of the SEDRIS transmittal except only data arrays of the <TPD> instances. The exoskeleton file is stored in a database as a binary file. Thus, the system cannot access the content of the exoskeleton file in the database. As mentioned above, only the metadata of the data array describing the size of the data array should be modified in the retrieval method. When a query is issued, the system retrieves data according to the query condition, then, brings the corresponding exoskeleton file from the database. The retrieved data from a database is added to the exoskeleton file to make the SEDRIS transmittal containing the query result. At that time, the metadata of the data array is modified to keep the consistency of the metadata and the data array. The retrieval method is explained in Section 3.3 in detail. 3.2.2. DB Schema In the proposed method, the SEDRIS transmittal is stored directly in a relational database. In this section, the schemas of the metadata table and the data table are explained. Figure 4 shows the schema of the metadata table. Each row of the metadata table stores the information of one SEDRIS transmittal. The metadata of the SEDRIS transmittal is basically stored in the exoskeleton file. However, there is some necessary information which should be informed to a user to issue a query. The necessary information is the metadata of the data array. The columns of the metadata table are as follows: Figure 4. The metadata table schema

Journal of Database Management, 25(4), 38-65, October-December 2014 49 ID: ID is an index number of the metadata table. ID is a primary key of the metadata table; AreaName: AreaName is the name of an area where the data stored in the corresponding SEDRIS transmittal is collected. A user can select an appropriate row to find the desired data; AttrString: AttrString is the concatenation of stored attribute names of the corresponding SEDRIS transmittal. In the SEDRIS transmittal used in this paper, AttrString value is Precipitation WindDirection WindSpeed. All attribute names are defined in the EDCS (Environmental Data Coding Specification) which is one of the SEDRIS components; TimeInfo: TimeInfo is the concatenation of all the time information in the corresponding SEDRIS transmittal. Each time information is converted to the one value using the same encoding. In the SEDRIS transmittal used in this paper, all time information is encoded like year-month-day-hour. Each SEDRIS transmittal used contains the data of the six hours; GridInfo: GridInfo is the concatenation of the grid information along three axes. Each axis is encoded like (First value) (Spacing) (Axis value count), then encoded codes of three axes are concatenated; RootFile: The SEDRIS transmittal consists of a root file and a set of data files because one SEDRIS transmittal can contain two or more regions or types of data together. In that case, the SEDRIS transmittal consists of one root file and two or more data files. In the case of the SEDRIS transmittals used in this paper, each SEDRIS transmittal contains only one data file. The root file is stored in RootFile column as a binary file. The root file is necessary to make the SEDRIS transmittal containing query result. Its new data file is the exoskeleton file with retrieved data as a query result; RootFileSize: RootFileSize is the size of the stored root file; ExoskeletonFile: The exoskeleton file is stored in ExoskeletonFile column as a binary file; ExoskeletonFileSize: ExoskeletonFileSize is the size of the stored exoskeleton file; DataTableName: DataTableName is the name of the data table which stores the data arrays of the corresponding SEDRIS transmittal. It is used in the retrieval method. When a user wants to issue a query, the user firstly requests the metadata information in the metadata table. Then, the user can select the desired row. Some columns such as AttrString, TimeInfo and GridInfo are used to make a proper query. The values of these columns can be used to compel the user to issue a query with proper parameters.

50 Journal of Database Management, 25(4), 38-65, October-December 2014 Figure 5 shows the schema of the data table. A data table contains the data array of only one SEDRIS transmittal. The columns of the data table are as follows: Time: Time is the encoded time information. For each SEDRIS transmittal used in this paper, gridded data is stored in six times. Thus, the sets of data of six different times are distinguished using this column; LocationNumber: LocationNumber is an index number of each cell in the grid of the region. The calculation to obtain a location number of a cell is explained below; TPD i : TPD i denotes the name of the i th attribute. The number of TPD i columns can be various according to the number of attributes contained in the corresponding SEDRIS transmittal. The location number of a cell is calculated using the location of the cell in a grid. For example, a cell which is located at (a, b, c) in the 0-based three-dimensional grid, the location number of the cell is calculated as follows: LN (a, b, c) = a + b * num_x + c * num_x * num_y where num_x and num_y are the numbers of values of the x axis and the y axis, and they are included in the metadata of the data array. Since the number of axes is variable, the location number is used instead of the real axis values. The possible number of axes is from one to three. The primary key of the data table is (Time, LocationNumber). The location number is based on the location of the cell. Thus, it cannot distinguish data values of the same cell, collected in the different time. Thus, Time is required for the primary key. The primary key of the data table determines a cell in a specific time. With this primary key, the schema of the data table is in Boyce-Codd Normal Form because only the primary key of the data can determine other values. 3.3. Retrieval Method In this section, the proposed retrieval method of the SEDRIS transmittal from the relational database is explained. Figure 6 shows the proposed retrieval method. As mentioned before, the data of the SEDRIS transmittal is stored in the data table and the other parts are stored in the metadata table. Before a user issues a Figure 5. The data table schema

Journal of Database Management, 25(4), 38-65, October-December 2014 51 Figure 6. The retrieval method of the SEDRIS transmittal from the relational DB query, the user can check all the metadata stored in the metadata table. Then, the user selects a row storing the metadata of the SEDRIS transmittal containing the desired data. The result of a query is returned as the SEDRIS transmittal using the stored exoskeleton file that contains all the metadata except only data arrays. To make the query result, the exoskeleton file stored in the selected row is retrieved. A user issues a query with query parameters. The query parameters are as follows: QTimeInfo: QTimeInfo denotes the collected time of the desired data of the user. In the SEDRIS transmittal used in this paper, six sets of the data grid with different time information exist. Thus, QTimeInfo can have the value between 0 and 5; QGridInfo: QGridInfo denotes the location information of the desired data. QGridInfo is converted to the set of location numbers to retrieve data in the corresponding data table; QAttrInfo: QAttrInfo denotes the set of attribute names which the user wants. The SEDRIS transmittal used in this paper has three attributes. Thus, the user can choose among the three attributes. With above three query parameters, proper data values in the data table are retrieved. However, they are not the SEDRIS transmittal before they are added to the exoskeleton file retrieved earlier. As explained in Section 3.2.1, the exoskeleton file consists of the metadata of the SEDRIS transmittal and the metadata of the data array. The metadata of the SEDRIS transmittal is not modified depending on the query condition while the metadata of the data array should be modified depending on the query parameters. The modified classes of the metadata of the data array are as follows:

52 Journal of Database Management, 25(4), 38-65, October-December 2014 <Property Grid Hook point>: The instances of <Property Grid Hook point> included in QTimeInfo only belong to the query result. Others are deleted from the retrieved exoskeleton file; <Regular Axis>: QGridInfo contains fewer cells than the regular axis information of the exoskeleton file. Thus, the instances of <Regular Axis> are modified as the same as QGridInfo. The first value and the axis value count along the axis are modified; <TPD>: Among the instances of <TPD>, attributes contained in QAttrInfo are only remained in the exoskeleton file. After the modification of the exoskeleton file, the retrieved data values are added to the modified exoskeleton file. Then, the exoskeleton with the retrieved data values is returned to the querying user as the query result. Figure 7 shows the proposed retrieval algorithm. 3.3.1. Screenshot of the Developed System Figure 8 shows the screenshot of the developed storing and retrieval system for SEDRIS transmittals. The Store Transmittal button is used to store the SEDRIS transmittal. In the retrieval method, a user firstly checks the Load metadata. Then, every metadata in the metadata table is shown to the user. The user can select the table name and the time interval. The grid information is inserted using text boxes of each axis. Attribute List is activated when the user selects the table name. The user can choose attributes to be retrieved. Figure 7. The proposed retrieval algorithm

Journal of Database Management, 25(4), 38-65, October-December 2014 53 Figure 8. The screenshot of the storing and retrieval system for SEDRIS transmittals 4. COMPARISON APPROACH In this section, the comparison approach (Sekar & Lee, 2004) is explained. As explained in Section 2, Sekar et al. propose a framework and an XML-based unified data model called FDM for creating a geospatial data repository. To the best of our knowledge, this is the only work for storing SEDRIS transmittals into a database. FDM is a data model for the geospatial data, the data used in Sekar et al. is not the same as the SEDRIS transmittals used in this paper, which contain atmospheric data. We design an approach to convert the SEDRIS transmittal to the XML document for the performance comparison. Then, the converted XML document is inserted into a relational database. Some converting rules to convert the SEDRIS transmittal to the XML document are as follows: A class of the SEDRIS transmittal is converted to an element of the XML document; A component class of a class of the SEDRIS transmittal is converted to a subelement of the corresponding element of the XML document;

54 Journal of Database Management, 25(4), 38-65, October-December 2014 A member variable of a class of the SEDRIS transmittal is converted to an attribute of the corresponding element of the XML document; If a member variable of a class is not a value but a structured variable, it should be a sub-element of the corresponding element of the XML document. Association is one of the relationships between DRM classes. Although a pair of classes is associated with each other, one class is not a component of the other class. However, one class is a sub-element of the other class in the converted XML document. For example, in the SEDRIS transmittal used in this paper, <Time Constraints Data> and <Property Grid Hook Point> are associated with each other. An instance of <Time Constraints Data> denotes the time that the data in an instance of <Property Grid Hook Point> is collected. In this case, <Property Grid Hook Point> is regarded as a component of <Time Constraints Data> because instances of <Property Grid Hook Point> are distinguished based on the instances of <Time Constraints Data>. 4.1. Storing Algorithms XML documents can be stored in relational databases in two ways. First, most of commercial database systems can regard the XML format as one of the fundamental data types. Thus, XML documents can be stored into relational databases like integer values in a single column. Second, several researches to convert XML documents to relational database schemas are conducted for years. In this way, an XML document is stored in many tables of a relational database after it is parsed according to its XML schema. In this paper, the former and the latter are called the XMLType method and the XMLConverter method, respectively. A commercial relational database management system, which supports for managing XML data in addition to relational data, is used for the relational database. The used relational database management system is denoted by DBMS-X. The converted XML document from a SEDRIS transmittal of this research is stored in a relational database using both ways. In the XMLType method, the XML document can be stored in a single column. Figure 9 shows the schema of the table for the converted XML documents in the XMLType method. Similar to the proposed method, the exoskeleton file is also required because the metadata of the SEDRIS transmittal is an essential part of the query result. The columns of the XML table schema are very similar to the metadata table in Figure 4. The only Figure 9. The XML table schema

Journal of Database Management, 25(4), 38-65, October-December 2014 55 difference is that the external data table is not required. The XMLDoc column is for the converted XML document, and its data type is the XML type. Thus, one converted XML document is stored in a row of the XML table. Among several researches to convert XML documents to relational database schemas, the Advanced XML Converter 5 is used. It parses the input XML file and groups tags by name, then converts the XML document to SQL statements that present relational database schemas. In the XMLConverter method, the XML files used in this research are stored in twenty tables, where the number of tables is equal to the number of distinct elements in the XML document. In order to return the query result as the SEDRIS transmittal, the exoskeleton file is also required. 4.2. Retrieval Algorithms The retrieval algorithms of the XMLType method and the XMLConverter method are explained in this section. DBMS-X provides the functionality to directly search the XML document stored in a column as the XML type. Thus, XQuery is used for the query language in the XMLType method. The result of XQuery is an XML document. Then, the final query result is obtained by converting the XML document to the SEDRIS transmittal because the query result should be returned as the SEDRIS transmittal similar to the proposed approach. Different from the case of the proposed approach, the metadata is also stored in the XMLDoc column which is the entire converted XML document. However, the XML-version exoskeleton file is also required. XQuery specifies the desired elements of a querying user in the query statement. At that time, the content of each retrieved element is the same as that in the XML document. It cannot be modified. In the converted XML document, elements for the metadata are in the higher levels than elements for the data array. Thus, there is no way to retrieve only the desired data in the data array with the metadata using XQuery. Therefore, the exoskeleton file is required here. To return the query result as the SEDRIS transmittal, the XML document is converted to the SEDRIS transmittal. Before the conversion, the retrieved data is added to the exoskeleton file. The metadata of the table data part of the exoskeleton file is modified depending on the query parameters. The processes to modify the exoskeleton file are very similar to those of the proposed approach except the file formats, the STF and the XML. In the XMLConverter method, XQuery cannot be used for the query language because the XML document is parsed and stored in many tables. To retrieve values according to the query, a query as an XQuery format is converted to the SQL query that is equivalent to the original query. The query result is converted to the set of XML elements, which is equivalent to the query result in the XMLType method. Then, the converted query result is finally converted to the SEDRIS transmittal through the same procedure in the XMLType method.

56 Journal of Database Management, 25(4), 38-65, October-December 2014 5. EXPERIMENTAL RESULTS The SEDRIS transmittals explained in Section 3.1 are used for the experiments. Figure 10 shows the specifications of the three SEDRIS transmittals. DBMS-X is used as the relational database management system. The performance is compared using the storing time and the retrieval time. The comparison schemes are as follows and Figure 11 shows the flow of storing and retrieval procedures of above three schemes: SEDRIS: This scheme is the proposed approach which does not require type conversion to store and retrieve SEDRIS transmittals from the relational database; XMLType: This scheme converts the SEDRIS transmittal to the XML document. The XML document is stored using the relational database which supports XML data management. The XML data type of DBMS-X is used to manage the XML documents in the relational database. Since the query results are returned as the XML format, the type conversion is required to transform the results into the SEDRIS transmittals. DBMS-X provides an index to process XQuery efficiently for the XML data type. XMLType (w/ index) and XMLType (w/o index) denote the experimental results of the XMLType scheme with and without an index, respectively; Figure 10. The SEDRIS transmittals and their storing times Figure 11. The flow of storing and retrieval procedures of three schemes

Journal of Database Management, 25(4), 38-65, October-December 2014 57 This scheme converts the SEDRIS transmittals to the XML documents and then stores the XML documents to the relational database by applying XML to relational database mapping. Advanced XML Converter is used as the mapping tool. Similar to XMLType, the type conversion of the query results is required. Figure 10 contains the storing time of three SEDRIS transmittals. SEDRIS and XMLConverter are worse than XMLType. In the cases of the XMLType method, an XML document is stored as a value in a database. Thus, they require only one DB transaction to an XML document. As shown in Figure 10, the storing time of SEDRIS is 7 times slower than that of XMLType (w/o index) and 32 times slower than that of XMLType (w/ index). However, in the cases of SEDRIS and XML- Converter, an XML document is parsed and all data values are stored separately. It makes the storing times of them worse. Between two results of the XMLType method, XMLType (w/ index) is slower than XMLType (w/o index) because XML- Type (w/ index) contains the time to construct an index. Between SEDRIS and XMLConverter, SEDRIS is 1.86 times faster than XMLConverter because SEDRIS separately stores the data array of the SEDRIS transmittal only. However, XML- Converter stores all types of elements in an XML document separately. Figures 14, 15, 16, and 17 show the three retrieval times of three SEDRIS transmittals. The two types of queries shown in Figures 12 and 13 are used to compare the retrieval time. # x values, # y values, and # z values denote the number of cells that are included in the query region along the x-axis, the y-axis, and the z-axis, respectively. # attributes denotes the number of attributes that should be retrieved through the query. # retrieved values is the number of retrieved values according to the query. The numbers of retrieved values of small queries in Figure 12 are less than fifty, and those of large queries in Figure 13 are greater than fifty. Figure 12. The specification of small queries

58 Journal of Database Management, 25(4), 38-65, October-December 2014 Figure 13. The specification of large queries Figure 14. The retrieval time of Jeju_Lev1 (small queries) Figure 15. The retrieval time of Jeju_Lev1 (large queries)

Journal of Database Management, 25(4), 38-65, October-December 2014 59 Figure 16. The retrieval time of Jeju_Lev2 Figure 17. The retrieval time of Jeju_Lev3 Figure 18. The overall processing time of Jeju_Lev1 (small queries)

60 Journal of Database Management, 25(4), 38-65, October-December 2014 Figure 14 shows the four retrieval times when the small queries are executed in the Jeju_Lev1 SEDRIS transmittal. The all retrieval methods retrieve the values satisfying query conditions from the database one value at a time. In this figure, SEDRIS and XMLConverter are much faster than the two schemes of the XMLType method. The retrieval time of SEDRIS is 30709 times faster than that of XMLType (w/o index) and 7696 times faster than that of XMLType (w/ index). In the XML- Type method, the XML document stored in a database is explored to retrieve the proper values for each query. They are very costly operations because the size of the XML document for the Jeju_Lev1 is over 75MB. Although XMLType (w/ index) is better than XMLType (w/o index) due to the used index, two results are not comparable with those of SEDRIS and XMLConverter. Between SEDRIS and XMLConverter, SEDRIS is 5.8 times faster than XMLConverter because SEDRIS requires no join operation unlike XMLConverter. Because the retrieval times of the XMLType methods are incomparably longer than those of the other methods, SEDRIS and XMLConverter are compared only from now on. Figure 15 shows the retrieval times of SEDRIS and XMLConverter when large queries are executed in the Jeju_Lev1 SEDRIS transmittal, and Figures 16 and 17 show the retrieval times of the two methods when all queries are executed in the Jeju_Lev2 and the Jeju_Lev3, respectively. According to Figures 15, 16, 17, SEDRIS is better than XMLConverter because SEDRIS does not require join operations. The storing times and retrieval times of comparison schemes are compared. As mentioned before, the environmental data server is constructed before the simulations are conducted. Thus, storing and retrieval do not proceed together. In addition, the data server may have to process many queries after the data is storing once. Usually, the simulation that requests for the environmental data needs many data regularly. Thus, we compare the overall performance of comparison schemes with the workout that consists of one storing and 100 retrievals. Figure 18 shows the overall performance of comparison schemes using the small queries and the Jeju_Lev1 SEDRIS transmittal. According to the result, SEDRIS shows the best performance. According to the experimental results, the proposed approach greatly reduces the retrieval time compared to the two methods using the XML conversion. In addition, the proposed approach reduces the storing time compared to XMLConverter whose retrieval time is comparable with that of the proposed approach. 5.1. Technical and Practical Implications In the previous section, the proposed scheme is compared with two XML-based comparison schemes in the storing time and retrieval time. XMLtype stores the converted XML file from an STF file as a native XML type of DBMS-X. Although index structures are used, the retrieval time of XMLType is the slowest among