DEVELOPING A GEOSPATIAL SEARCH TOOL USING A RELATIONAL DATABASE IMPLEMENTATION OF THE FGDC CSDGM MODEL

Similar documents
Developing a Geospatial Search Tool Using a Relational Database Implementation of the FGDC CSDGM Model

Presented by Kit Na Goh

7. METHODOLOGY FGDC metadata

ESRI stylesheet selects a subset of the entire body of the metadata and presents it as if it was in a tabbed dialog.

Answer the following general questions: 1. What happens when you right click on an icon on your desktop? When you left double click on an icon?

How to Create Metadata in ArcGIS 10.0

Esri s ArcGIS Enterprise. Today s Topics. ArcGIS Enterprise. IT4GIS Keith T. Weber, GISP GIS Director ISU GIS Training and Research Center

QGIS LAB SERIES GST 103: Data Acquisition and Management Lab 1: Reviewing the Basics of Geospatial Data

LAB 1: Introduction to ArcGIS 8

Army ITAM GIS: Utilizing ArcSDE and SDSFIE to Support the Enterprise

Esri s Spatial Database Engine. Today s Topics. ArcSDE. A spatial database engine that works on

Metadata Requirements for SanGIS Data Layers

Working with Metadata in ArcGIS

existing Click the Description Tab

Metadata or "data about data" describe the content, quality, condition, and other characteristics of data. The Federal Geographic Data Committee

Metadata or "data about data" describe the content, quality, condition, and other characteristics of data. The Federal Geographic Data Committee

Geospatial Metadata Development and Use

DATA SHARING AND DISCOVERY WITH ARCGIS SERVER GEOPORTAL EXTENSION. Clive Reece, Ph.D. ESRI Geoportal/SDI Solutions Team

Introduction to ArcCatalog

System Administration

Contact: Systems Alliance, Inc. Executive Plaza III McCormick Road, Suite 1203 Hunt Valley, Maryland Phone: / 877.

SHARING GEOGRAPHIC INFORMATION ON THE INTERNET ICIMOD S METADATA/DATA SERVER SYSTEM USING ARCIMS

Unpacking Instructions

Chapter 15 Creating and Editing Metadata

GISCI GEOSPATIAL CORE TECHNICAL KNOWLEDGE EXAM CANDIDATE MANUAL AUGUST 2017

Metadata: The Theory Behind the Practice

LSGI 521: Principles of GIS. Lecture 5: Spatial Data Management in GIS. Dr. Bo Wu

EarthCube and Cyberinfrastructure for the Earth Sciences: Lessons and Perspective from OpenTopography

Spatial Data Standards for Facilities, Infrastructure, and Environment (SDSFIE)

GST 101: Introduction to Geospatial Technology Lab 2 - Spatial Data Models

GEOSPATIAL ERDAS APOLLO. Your Geospatial Business System for Managing and Serving Information

Enterprise Geographic Information Servers. Dr David Maguire Director of Products Kevin Daugherty ESRI

Compass INSPIRE Services. Compass INSPIRE Services. White Paper Compass Informatics Limited Block 8, Blackrock Business

Introduction to Geographic Information Systems Spring 2016

Spemmet - A Tool for Modeling Software Processes with SPEM

Introduction to QGIS: Student Workbook

Using ArcCatalog. GIS by ESRI

Geospatial Databases: Metadata, Standards, and Infrastructure

An UML-XML-RDB Model Mapping Solution for Facilitating Information Standardization and Sharing in Construction Industry

The What, Why, Who and How of Where: Building a Portal for Geospatial Data. Alan Darnell Director, Scholars Portal

Understanding and Using Metadata in ArcGIS. Adam Martin Marten Hogeweg Aleta Vienneau

ArcGIS 9. Using ArcCatalog

Setting Up Resources in VMware Identity Manager (On Premises) Modified on 30 AUG 2017 VMware AirWatch 9.1.1

ArcSDE 8.1 Questions and Answers

Executive Summary SOLE SOURCE JUSTIFICATION. Microsoft Integration

Getting Started With LP360

Parallel Geospatial Data Management for Multi-Scale Environmental Data Analysis on GPUs DOE Visiting Faculty Program Project Report

Lab Assignment 4 Basics of ArcGIS Server. Due Date: 01/19/2012

A Metadata Standard for IGI&S: Spatial Data Standards for Facilities, Infrastructure, and Environment - Metadata (SDSFIE-M)

InCLUDE Data Exchange. Julia Harrell, GISP GIS Coordinator, NC DENR

Introduction to ArcGIS Server 10.1

Worksheet for minimally-compliant FGDC metadata

Brown University Libraries Technology Plan

Leveraging metadata standards in ArcGIS to support Interoperability. Aleta Vienneau and Marten Hogeweg

Understanding and using Metadata across the ArcGIS Platform. Aleta Vienneau Marten Hogeweg

hrs. Designing Fundamentals 2 Paper-II: Data base hrs. management Systems Semester IV 3 Paper-I: Web

AGIC 2012 Workshop Leveraging Free RDBMS in ArcGIS

ArcGIS Server: publishing geospatial data to the web using the EEA infrastructure

The MVC client server architecture of the BSC-OS portal to digest, manage, and query SWAT data collections

IMAGERY FOR ARCGIS. Manage and Understand Your Imagery. Credit: Image courtesy of DigitalGlobe

TIM 50 - Business Information Systems

International Multidisciplinary Metadata Workshop 18 January Rebecca Koskela Arctic Region Supercomputing Center

INSPIRE WS2 METADATA: Describing GeoSpatial Data

Copyright 2002, 2004 ESRI All rights reserved. Printed in the United States of America.

A USER S GUIDE TO REGISTERING AND MAINTAINING DATA SERVICES IN HIS CENTRAL 2.0

ArcGIS 9. Using ArcCatalog

Setting Up Resources in VMware Identity Manager (SaaS) Modified 15 SEP 2017 VMware Identity Manager

GIS Basics for Urban Studies

Leveraging metadata standards in ArcGIS to support Interoperability. David Danko and Aleta Vienneau

SDSFIE Online: What's New and Improved

Exercise 1: Getting to know ArcGIS

Spatial Data Standards for Facilities, Infrastructure, and Environment (SDSFIE)

CISER Data Archive Collection Policy

A GML-Based Open Architecture for Building A Geographical Information Search Engine Over the Internet

Mississippi Public Schools 2015

Metadata Topic Harmonization and Semantic Search for Linked-Data-Driven Geoportals -- A Case Study Using ArcGIS Online

Student Usability Project Recommendations Define Information Architecture for Library Technology

West Virginia Site Locator and West Virginia Street Locator

Tutorial 1: Finding and Displaying Spatial Data Using ArcGIS

Introduction to Geodatabase and Spatial Management in ArcGIS. Craig Gillgrass Esri

ENGRG 59910: Introduction to GIS

TDNet Discover User Manual

SEXTANT 1. Purpose of the Application

GEOG 487 Lesson 2: Step-by-Step Activity

InCLUDE Data Exchange. Julia Harrell, GISP GIS Coordinator, NC DENR

Welcome to NR402 GIS Applications in Natural Resources. This course consists of 9 lessons, including Power point presentations, demonstrations,

Wendy Thomas Minnesota Population Center NADDI 2014

Introduction to IBM DB2

Introduction to INSPIRE. Network Services

SCENTOR: Scenario-Based Testing of E-Business Applications

A Hands-on Experience with Arc/Info 8 Desktop

PODS Lite. Technical Overview and Guide

Setting Up Resources in VMware Identity Manager. VMware Identity Manager 2.8

Geodatabases. Dr. Zhang SPRING 2016 GISC /03/2016

Geog 469 GIS Workshop. System Requirements

A Method for Representing Thematic Data in Three-dimensional GIS

Intelligent Traffic System: Road Networks with Time-Weighted Graphs

Data publication and discovery with Globus

ImageNow eforms. Getting Started Guide. ImageNow Version: 6.7. x

Chapter 1. Types of Databases and Database Applications. Basic Definitions. Introduction to Databases

Transcription:

DEVELOPING A GEOSPATIAL SEARCH TOOL USING A RELATIONAL DATABASE IMPLEMENTATION OF THE FGDC CSDGM MODEL Kit Na Goh, 1 Keith Weber, 1 GISP, Daniel P. Ames, 2 PhD, PE 1 GIS Training and Research Center, Idaho State University 2 Department of Geosciences, Idaho State University Paper UC1011 Abstract The GIS Training and Research Center (GIS TReC) at Idaho State University provides the public with free access to approximately 11,000 GIS datasets stored within its spatial library. As is the case with many data libraries throughout the world, the GIS TReC data library stores data in file archives (using zip compression) to reduce data storage requirements. However, this approach is not conducive to live web-based data queries. Rather, visitors are required to browse a massive directory of folders and sub-folders to find the needed data. In order to facilitate better discovery and delivery of geospatial data, we are developing and deploying a relational database containing geospatial metadata documentation for all datasets within the spatial library and an intelligent web interface to help website visitors more efficiently find required geospatial data. This paper describes the relational database design and the process followed to import existing XML-based metadata documents into the new fully searchable metadata database. 1.0 Introduction The Idaho State University (ISU) Geographic Information System Training and Research Center (GIS TReC) is a university-wide facility administered by the Office of Research serving all ISU colleges and departments and the GIS community of southeastern Idaho and is a member of the ESRI-managed Geography Network (geographynetwork.com). The mission of the GIS TReC is to facilitate decision-making through the use and application of state-of-the-art geospatial technologies. The GIS TReC maintains over 11,000 GIS datasets within its spatial library and allows remote users to freely access both raster and vector GIS data via a simple web-based directory structure. The datasets included in this library primarily describe features within the GIS TReC Area of Concern (AOC figure 1) including for example digital elevation model data, digital orthophoto quads, and vegetation. The library also includes data from outside the AOC such as regional, national, and world data administrative boundaries and large scale environmental data.

Figure 1: Area of Concern 1.1 The Current System The current data delivery mechanism of the GIS TReC spatial library requires clients to browse the spatial library directory of folders and files to find the data they need and then download the data to a local desktop computer to perform GIS mapping and analysis tasks and operations. This approach, though perhaps the most simple spatial library implementation, has several drawbacks including: 1) users must know the name of the required zip file and containing folders; 2) users can not easily search the data to discover new or otherwise useful data; and 3) users must download the entire zip archive for a particular data set before they can open it and browse the contents (including the associated metadata). Figures 2 and 3 illustrate the process followed by a client using the spatial library to find geospatial data. An inherent problem with this approach is that users are not certain if the data they have chosen is correct for their application until they have downloaded, extracted, and previewed it in an application like ArcCatalog. The existing GIS TReC geospatial library directory structure includes non-archived HTML-based metadata files stored within the same folder as the dataset it describes so that they can be indexed by Google and other search engines. This allows a user unfamiliar with the spatial library to do a Google site search (available from the GIS TReC s website) on these files to gain some idea of the available data. We recognize that the current approach is not optimal for all of these reasons, and suspect that other geospatial librarians may have similar problems. Hence, the focus of this research is to develop a geospatial data search tool using a relational database implementation of the FGDC CSDGM model. The goal of this tool is to allow one to easily query all of the metadata describing datasets in the geospatial library through a simple, fast, and powerful search tool. To determine the types of users of our spatial library, we generated several statistical reports using the software, Flashstats ( http://www.maximized.com/). From these reports,

Figure 2: Current spatial data search Figure 3: Schematic of how a client currently finds the data we found that 26.3% of clients accessed the GIS TReC spatial data library from within the Idaho State University internet domain (isu.edu). An additional 0.43% of the clients were from local state agencies, and 2.4% of clients searched the spatial library from workstations within the GIS TReC itself. These data indicate that about 28.7% of all of our users are students, faculty members, and GIS professionals. Such users are likely to a) know what data they need from our library; and b) have the ability to easily locate it using the current directory browsing access method. However the remaining users (71.3%) who come from outside the university (e.g. from the Geography Network) and the local professional community are surely not as familiar with the GIS servers and especially the organization of the spatial library. Hence we expect that an improved search interface will better serve this large segment of the user community and indeed should expand the user base for this particular regional geospatial data library. It is very likely that other research centers and data archives are used similarly and could also benefit from the development of tools for rapidly searching and accessing geospatial metadata and raw data. 1.2 An FGDC Based Metadata Database Approach To manage and support sophisticated geospatial data searches, we developed a relational database to act as the backbone against which the search interface executes. Our relational database design is intended to improve data discovery and delivery by including all information captured within geospatial metadata documents. The Federal Geographic Data Committee (FGDC) geospatial metadata format was chosen for the base metadata standard for this project.

The FGDC metadata standard (FGDC 2000) describes GIS datasets using seven major sections and three supporting sections. The seven major sections are: 1) identification, 2) data quality, 3) spatial data organization, 4) spatial reference, 5) entity and attribute, 6) distribution, 7) metadata reference information. The three supporting sections are divided into 1) citation, 2) time period, 3) contact information. In addition to this information, the relational database developed for this project also contains the name and URL of the geospatial dataset, along with descriptive keywords, to better facilitate searching and download. Using a custom search interface developed with Active Server Page (ASP) technology, all geospatial metadata information for the entire library (and potentially outside the library) becomes immediately viewable without the user having to download and extract the dataset. 1.3 Anticipated Benefits In addition to the improvement in search and query functionality for end-users, an additional benefit of this approach will be the reduction in overall disk space usage and the removal of redundant stand-alone HTML-based metadata files. We anticipate this will also help improve long-term maintenance of the archive. (Figure 2) Figure 4: Reduction in Redundant Documentation

2.0 Project Details 2.1 Objectives The objectives of this project include: 1) Improve data discovery: To improve the current search engine of the GIS TReC to help clients more efficiently find required geospatial data and understand the structure of metadata. 2) Develop a robust relational database design using UML CASE visual modeling: design a relational database (Rational Rose software was used to create the UML model) to contain the information within FGDC geospatial metadata documents (extracted from ArcXML documents). The database will include the name and the path to each geospatial dataset, along with descriptive keywords, to better facilitate searching. The database was designed following a top-down approach and normalized through third form normal. The UML design is available at http://giscenter-sl.isu.edu/umlmodel/index.htm 3) Automate database population: Create an XML Parser to parse geospatial metadata (stored within ArcXML documents) and automatically populate the relational database described above. 4) Develop an intelligent web interface to facilitate effective use of the database: Develop an intelligent web interface (that has the capability to query the spatial data with topology keywords, such as, intersect, contain, and adjacent and view the metadata in XML style sheet before downloading) to assist clients searching for data available in the GIS TReC spatial library. This will be accomplished using ASP, HTML, and Javascript. 5) Make the entire process open to the GIS Community: Organize and documents all materials describing database design, XML parser design, and Active Server Page design to assist others in the GIS community. 2.2 Database Design Using a UML CASE Class Diagram At the outset, a database schema or design did not exist that would support geospatial metadata documents. For this reason, the first step was a careful database design normalized to third form normal. Unified Modeling Language (UML) is a standard diagramming language used to design database schemas among other things. One software tool used to create UML models is Rational Rose (Wendy and Michael, 1999). To manage and support sophisticated searches, a relational database with tables that describe GIS datasets based on the seven major sections of the FGDC metadata standard (FGDC 2000) was created using Rational Rose. This database design includes all information captured within geospatial metadata documents, the name and file path to the geospatial dataset, along with descriptive keywords (Figure 6). The advantage of using a UML model is its ability to transition into a physical database using the UML Class diagram model. Changes can be made to the model and exported

Figure 5: The UML Class diagram for the relational database developed using Rational Rose. using the XML (Extensible Markup Language) file (consisting of UML XMI (XML Metadata Interchange)) which facilitates transfer of UML diagrams to other software applications (such as ArcCatalog). When a change occurs to the model, Rose can modify the code to incorporate the change (Wendy and Michael, 1999), this helps to ensure that the model is resilient and flexible. After exporting the XMI file from Rational Rose, a Microsoft Access relational database (though another RDBMS could just as easily be used) was created using the CASE Schema tool from ArcCatalog. The UML design is available at http://giscentersl.isu.edu/umlmodel/index.htm 2.3 Development of ArcXML Parser Software To populate the relational database, all information stored in the HTML-based metadata needed to be imported. Parsing such metadata files can be quite difficult because of the variations of formatting one might encounter in different HTML files. However, when the metadata is stored within the standard ArcXML file format, the process of parsing and importing metadata becomes greatly simplified. An initial XML File Loader application was written using Visual Basic 2005 to perform this function and is currently undergoing update and improvement, although it is already functional. Using the XMLTextReader function in Visual Basic 2005, the XML parser application imports ArcXML formatted metadata to the relational database. The goal of building the

XML parser software is to read an ArcXML metadata document, parse it, extract the values (fields and attributes) and write the data into the appropriate relational database tables. A Graphical User Interface (GUI) has been designed (figure 7) for the ArcXML parser program. The GUI includes basic functions (such as Open, Save, Add, Remove Files and Open a folder) (Figure 7) within two areas on the page: a file screen and a preview screen. The file screen displays the name of all XML files contained in the current workspace. The preview screen previews selected XML files along with their keys and values. Keys are objects that are used for selection during data retrievals. Figure 7: Screenshot of XML Parser 2.4 Development of an Intelligent Web Search Interface The GIS TReC currently offers two search options to their web clients. The first is a manual search and the second is a simple search powered by Google. The current search engines have limitations and problems which cannot fully facilitate geospatial data discovery and delivery needs, namely a connection between search HTML metadata files and download of the related dataset. Microsoft products or indexing services are also not sufficient as these do not reveal files stored within ZIP files. Because of the size of geospatial data, much of it has been bundled and compressed in ZIP format. This unfortunately, hides much of the real data from clients. In order to locate and retrieve data, an intelligent web interface was required enabling clients to choose or enter search

criteria and preview the metadata (all done within the database) without having to download the dataset or try to memorize paths to the required data files. Search engine tools are becoming common and most search through the meta-databases against meta descriptions of their geospatial data range. A natural approach is to add advanced features to search engines that allow users to express constraints or preferences in an intuitive manner, resulting in the desired information to be returned among the first results. In fact, search engines have added a variety of such features, often under a special advanced search interface, though mostly limited to fairly simple conditions on domain, link structure, or last modifications date (Markowetz et al, 2005). The proposed method of data retrieval will contain features that allow clients to enter keywords or search phrases and also preferences or constraints such as, building topological relationships (intersect, contain, and adjacent) to locate the correct dataset. This feature will add more capability than the typcial geospatial data search engine (Figure 8). Figure 8: Schematic of new streamlined fashion of finding data 3.0 Conclusions To facilitate better discovery and delivery of geospatial data for GIS TReC clients, a robust relational database containing FDGC-compliant geospatial metadata was developed. To accomplish this, Rational Rose CASE software was used to create a UML class diagram of the relational database. The model was transitioned (using ArcCatalog s CASE schema tool) to a physical database currently using MS Access (though plans include a transition to IBM DB2 9.1 in the future). Currently, the relational database is being populated using ArcXML metadata documents parsed using XML Parser software

developed by the author. Upon completion of this phase of the project, the database will be coupled to an intelligent web interface that searchers for data using geospatial operators. Initially the metadata within this database will describe geospatial data within the Idaho State University GIS TReC spatial library only. However, once initial implementation is completed, the database will be opened and members of the GIS community will be able to post metadata describing geospatial data from around the world. This distributed design (potentially mirrored on other servers) where only the geospatial metadata is stored within the database (along with a URL path to the data itself) will help facilitate data discovery, sharing, and the development of the GeoWeb. Acknowledgements: This study was made possible by a grant from the National Aeronautics and Space Administration Goddard Space Flight Center which was made possible through efforts of the Idaho congressional delegation. References Boggs, Michael & Boggs, Wendy (1999). A Tour of Rose. Mastering UML with Rational Rose (pp.34). California: SYBEX. Bradley, Julia & Millspaugh, Anita (2003). Programming in Visual Basic.Net. New York, NY: McGraw-Hill Higher Education. C.J.Date (2003). Relational Systems And Others. An Introduction to Database Systems (8 th ed.). Addison Wesley. ESRI, Environmental System Research Institute. ESRI Profile of the Content Standard for Digital Geospatial Metadata., March 2003. Retrieved 27 September, 2006 from http://www.esri.com/metadata/esriprof80.html FGDC (2000). Content Standard for Digital Geospatial Metadata Workbook. Federal Geographic Data committee. Markowetz, Alexander, Yen-Yu Chen, suel, Torsten, Xiaohui Long, and Seeger, Bernhard (June, 2005). Design and Implementation of a Geographic Search Engine. Eighth International Workshop on the Web and Databases (WebDb 2005), Baltimore, Maryland, June 16-17, 2005. Schneider, David (2005). An Introduction to Programming Using Visual Basic 2005, sixth edition. Upper Saddle River, NJ: Pearson Prentice Hall.

Author Information Kit Na Goh Graduate Research Assistant, GIS Training and Research Center Idaho State University, 921 S. 8 th Ave Stop 8104, Pocatello, ID 83209 kigoh2001@yahoo.com Keith Weber, GISP GIS Director, GIS Training and Research Center Idaho State University, 921 S.8 th Ave Stop 8104, Pocatello, ID 83209 webekeit@isu.edu Daniel P. Ames, PhD, PE GIS Program Coordinator, Department of Geosciences Idaho State University, 1784 Science Center Dr., Idaho Falls, Idaho, 83402 amesdani@isu.edu