A Framework for Ready Accessibility to Geospatial Data Using the WWW Bradford G. Nickerson *, Ying Teng, Jun Xiao, Lushu Li Presentation to the Digital Earth 2001 Conference, Monday, June 25, 2001, Fredericton, N.B., Canada, abstract number de-a-099 * contact person: Faculty of Computer Science, University of New Brunswick, P.O. Box 4400, Fredericton, N.B. Canada, bgn@ unb.ca
Architectural Overview Java-capable web browsers Queries Catalog Server Database ISO XML Metadata Desktop PC Internet Server URLs Database Geospatial Data Server 1 Laptop Database Geospatial Data Server 2
Three Tier Architecture Java-capable Web browser The First Tier (Client Tier) GSDWApplet CLIMapApplet TiffyApplet The Second Tier (Middle Tier or Applcation Tier) Context Data Server Server Container Query Servlet GSDIndex Web Server Geospatial Data Service Servlet Web Server The Third Tier (Data Storage Tier) Database GSHHS & GMT Data Database ISO XML Metadata Database Geospatial Data
Building the GSDIndex (Catalog Server) Metadata Vector Data Raster Data CeoNet Data CLI Data CCRS Data XMLMetadataTranslator CLIMetadataBuilder CCRSMetadataBuilder Phase 1 ISO XML Metadata R_TreeBuilder Create or Update Bounding Box R-Tree Index AVL_TreeBuilder Create or Update String (Keywords) AVL-Tree Index Phase 2 GSDIndexBuilder Phase 3 GSDIndex
GSDIndex Object Structure
GSD Catalog Search Engine GSD Client Send Query Return Result Network HTTP keywords, bounding box, match factor k GSD Catalog Server Query Servlet Boundary GSD Index Keywords Bounding Box RTree Index String Keyword AVL-Index RTree Search Result AVL Search Result Intersect GSD Search Result Postprocess Dynamic HTML Document
where Catalog Server Search Engine Combined AVL tree and R-tree Return data objects satisfying min S(D) = size (area) of the bounding box (MBR) of the data object, S(O) S(O), S(D) S(Q) S(Q) = size (area) of the query rectangle, S(O) = size (area) of the overlap of the query rectangle and the MBR of the data object. k
Effect of Match Factor k D 2 D 5 Q Q D 1 D 1 D D 4 D 6 3 Q: Query rectangle; D i (i=1,2,...,6): Data sets No. 1 2 3 4 5 6 Range of k (0, 0.025] (0.025, 0.1] (0.1, 0.2] (0.2, 0.5] (0.5, 0.625] (0.625, 1] Min{S(O)/S(D), S(O)/S(Q)} D2 0.1 0.625 0.2 0.0 0.5 0.025 Returned Data Sets D1, D2, D3, D5, D6 D1, D2, D3, D5 D2, D3, D5 D2, D5 null
Graphical User Interface
Client Component Cummunication Databas e Catalog Server Database CLI Data Query Result CCRS Data GSDWApplet Java Servlet ShowCLIMap FileName FileName Java servlet ShowCCRSMap CLIMapApplet TiffyApplet Client
Context Data Experiments Testing compared Java Servlet, CORBA (Java), CORBA (C++), RMI (Remote Method Invocation) Used world s shoreline database GSHHS (global self-consistent, hierarchical, high-resolution shoreline) crude, low, intermediate, high and full [Wessel and Smith, 1996] world political borders and rivers from Generic Mapping Tools [Wessel and Smith, 1999] Client Server send rquest to server read data from files clip polygon/polyline coordinates transform get data from server send data to Client draw map on screen
Canada Land Inventory (CLI) Vector Data Server Thin client via fat server (TC/FS) CLIDataServer reads raw CLI data Process into a Java serialized object Transfer Java object to CLIMapApplet over the Internet Fat client via thin server (FC/TS) CLIDataServer reads raw CLI data Transfer raw CLI data to the client CLIMapApplet creates Java object for display Applet CLIMapApplet (1) Servlet CLIDataServer(1) filename filename Database filename Applet GSDWApplet filename Servlet ShowCLIMap filename Applet CLIMapApplet (2) Servlet CLIDataServer(2) filename Database filename Distributed CLI Data Distributed CLI Data
Test Datasets Machine: butter GSHHS/GMT, CEONet, CLI (west, 942) Machine: toast CLI (east, 648), CCRS (45)
Test DatasetsD Data Set Name Type /Format Number of Files Total Size (MB) GSHHS ASCII 2448 284 /GMT CEONet XML 6979 100 Metadata CLI ARC/INFO 1690 4809 Export (.E00) CCRS Image 45 612 TIFF / JPEG Totals: 11,162 5,805
Test Environment Host Machine Role Name CPU Memory/LAN Operating System butter Catalog Server PIII 750MHz 512 MB/100Mbps Win. 2000 Prof. butter CLI Data Server (West) PIII 750MHz 512 MB/100Mbps Win. 2000 Prof. toast CLI Data Server (East) PIII 750MHz 512 MB/100Mbps Win. NT WS 4.0 toast CCRS Data Server PIII 750MHz 512 MB/100Mbps Win. NT WS 4.0 Role Name CORBA-based contextual data server RMI-based contextual data server Servlet-based contextual data server Query, CLIData and CCRSData server Client Web Browser Software Used VisiBroker 4.1 (for Java and C++) JDK1.2.2 BEA WebLogic Server 6.0 BEA WebLogic Server 6.0 Microsoft IE 5.0
Preprocessing the GSDIndex Initial Build (for CEONet metadata): Name Build Time Save Time Total Size Height (ms) (ms) (KB) R-Tree 143,597 320 284 3 AVL-Tree 207,811 2,914 3,211 16 Additional Build (adding Canada Land Inventory (CLI) data): Load Time Build Time Save Time Total Size Height (ms) (ms) (KB) R-Tree 1.042 13,089 320 322 3 AVL-Tree 13.389 25,277 5,178 3,600 16 Total GSDIndex size: 3,921 KB Servlet Load Time (serialized Java object): 7,710 ms
Test Results Contextual Data
Time for Catalog Search Bounding Boxes Keywords # of items Time Time/Item (k = 0.01) found (ms) (ms) [0, 360; -90, 90] earth science 3,794 7,751 2.04 [0, 360; -90, 90] oceans 1,196 2,754 2.30 [0, 360; -90, 90] Land use 80 2,694 33.68 [0, 180; -90, 90] earth science 2,265 5,278 2.33 [0, 180; -90, 90] oceans 842 411 0.49 [0, 180; -90, 90] Land use 46 324 7.04 [225, 300; 40, 60] earth science 2,878 6,499 2.26 [225, 300; 40, 60] oceans 753 1,452 1.93 [225, 300; 40, 60] Land use 246 1,352 5.5
Time for Vector Data Display Test Data Data Size (KB) Time/KB (ms) FC/TS Time/KB (ms) TC/FS A021G.E00 5,495 2.3 74.6 A030M.E00 A012B.E00 3,018 1,845 3.3 3.0 69.7 71.7 A021K.E00 253 2.7 224.6 A011C.E00 19 17.3 379.5
Conclusions A three-tier client/server architecture was successful for building a web-based distributed geospatial data warehouse For context data searches on the catalog server, CORBA and RMI are an average of 2.2 times slower than Java Servlets Match factor controlled search can significantly reduce search results of unequal size relative to the query window Combined keyword and bounding box search on the catalog server requires an average of 6.4 ms per item found (average over nine very different searches) Fat client via thin server (FC/TS) is significantly faster (average of 23 times faster over five different search experiments) than thin client via fat server (TC/FS) for vector dataset retrieval and display
Future Work Data structures for efficient combined text and spatial data range search; tries? Geospatial web crawler to collect information from geospatial data servers and update the catalog server search index in a continuous, unattended fashion More detailed context data (e.g. Digital Chart of the World)
Acknowledgements GEOIDE Network of Centres of Excellence Réseau de centres d excellence La géomatique pour des interventions et des décisions éclairées Geomatics for Informed Decisions Opportunity to meet new researchers Financial support project DEC#2 GEODEM: Designing the technological foundations of geospatial decision making with the world wide web University of New Brunswick Fuqun Zhou, Canada Centre for Remote Sensing