PROJECT REPORT. Infrastructure for Large-scale Data Resource Sharing & An Environmental Case Study

Size: px
Start display at page:

Download "PROJECT REPORT. Infrastructure for Large-scale Data Resource Sharing & An Environmental Case Study"

Transcription

1 PROJECT REPORT Infrastructure for Large-scale Data Resource Sharing & An Environmental Case Study School of ITEE The University of Queensland Abstract. Large-scale data sharing is nontrivial due to the large-scale, autonomous, and heterogenous nature of data sources. This project represents our initial effort on this problem. The main goal of our project is to identify requirements, system architectures, and key technology barriers to establish an ICT infrastructure to support large-scale data resource sharing between research institutions. To achieve this goal, in the report, we first investigate a few main issues which are important for large-scale data sharing, such as interoperability, extensibility, and scalability, and meanwhile, highlight the necessity for possible synergies among several technologies like Grid, Peer-to-Peer (P2P), and data integration technologies. We then propose a novel service-oriented architecture, which is designed specially for large-scale data sharing. And finally, we give an environmental case study and report our experiences. 1 Motivation, Scope, and Goals Modern, large-scale data sharing is typically characterized by the large volumes of data involved, and the heterogeneity of data sources accessed [23]. For example, in order to answer complex biological questions, biologists have to access and analyze large quantities of biological data which are stored over widely distributed repositories. These repositories, each making its own decision about data storage and retrieval, are highly autonomous and heterogeneous. For instance, they may describe the same data objects using different representations, e.g., protein sequence used in SWISS-PROT, and structure in the Protein Data Bank (PDB). To share data under such circumstances, one possible approach is data replication. That is, data to be shared are first replicated to local repositories or a central repository before any processing (e.g., data mapping, transformation). Though simple, such an approach suffers from apparent limitations like unnecessary bandwidth cost and high maintenance cost. Moreover, sometimes it may not be feasible due to privacy reasons. Data integration, on the other hand, avoids above limitations with data replication, by allowing the flexible and managed federation, exploration, and processing of data from distributed sources [9]. Over the last decade, much effort has been put into data integration from various communities. However, till today, it is still a great challenge to support data sharing in a large scale.

2 To get more insight into the problem, and thus facilitate developing sophisticated techniques for large-scale data sharing, we focus our study in the Environmental Science area. We chose Environmental Science as our focus area based on the following considerations: (1) it exemplifies this problem with non-trivial yet not overly complex data and models; (2) it is an area with a clear need for national and international collaboration, which has not yet become achievable due to issues many of which modern information and communications technology is well positioned to address; (3) it itself is an important area and our results can be applied to generate immediate and significant benefits; and (4) the Queensland EPA is a committed partner who provides large and complex real data, spatial models, operational environment, user requirements and domain expertise. Our research, however, is not limited to environmental sciences, but aims at supporting all data intensive scientific research. The main goal of our project is to identify requirements, system architectures, and key technology barriers to establish an ICT infrastructure to support large-scale data resource sharing between research institutions. Specifically, we hope to achieve the followings: Insight knowledge about important issues involved in large-scale data sharing; Insight knowledge about key technologies (e.g., strength, weakness) and their roles in large-scale data sharing; The design of a large-scale, general purpose infrastructure to support data intensive applications; A working prototype to support data sharing among a selected collection of geospatial data sources (centred around the WildNet database from the EPA). To achieve the above, in the report, we first investigate a few main issues which are important for large-scale data sharing, such as interoperability, extensibility, and scalability, and meanwhile, highlight the necessity for possible synergies among Grid, P2P, and data integration technologies: by combining Grid and data integration technologies, we facilitate the interoperability among heterogeneous data sources; by integrating P2P technologies into both Grid and data integration technologies, we improve the extensibility and scalability of data sharing. We then propose a novel service-oriented architecture based on these technologies, which is designed specially for large-scale data sharing. Finally, we give an environmental case study and report our experiences. The rest of the report is organized as follows: Section 2 describes the state of the art of large-scale data sharing, including important technologies, their roles, and recent efforts; Section 3 investigates main issues involved in large-scale data sharing, and highlights the necessity for technology integration; Section 4 presents the proposed architecture; Section 5 gives an environmental case study; and Section 6 summarizes what we have achieved and points out the future work. 2 The State of the Art In this section, we give a review of the state of the art of large-scale data sharing, including important technologies (i.e., Grid, P2P and data integration technologies), and recent efforts in integrating these technologies for large-scale data sharing.

3 2.1 Grid Technologies Grid technologies and infrastructures aim at supporting coordinated resource sharing and problem solving in dynamic, multi-institutional virtual organizations [12]. The Open Grid Services Architecture (OGSA) [10] is designed to facilitate the interoperability among different Grid deployments, which aligns Grid technologies with Web Services technologies, and introduces a service-oriented paradigm into the Grid. The first formal and technical specification of OGSA is the Open Grid Services Infrastructure (OGSI) [32], which has several implementations such as Globus Toolkit 3.0 (GT3) [14]. Currently, OGSI is evolving towards the Web Services Resource Framework (WSRF) [38] to embrace new Web Services standards. OGSA and OGSI OGSA adopts a common representation for all resources (e.g., computational and storage resources, programs, databases): each resource in OGSA is represented as a Grid service, i.e., a Web service that provides a set of well-defined interfaces and follows specific conventions [11]. OGSI further specifies the basic interfaces (or porttypes) to be implemented by Grid services, such as GridService (GS), Factory, ServiceGroup, and so on. Most of these interfaces are optional, except GridService, which is a mandatory interface, and must be implemented by all Grid services. Depending on what interfaces are implemented, Grid services with different functionalities may result. Grid services can be instantiated dynamically. Each instantiation of a Grid service generates a Grid service instance which is identified by the Grid Service Handle (GSH) and the Grid Service Reference (GSR). The difference between GSH and GSR is that, GSH is invariant, while GSR is stateful and can change over the life time of a service instance. To create a service instance, a Grid service, called factory, is invoked, which implements the Factory interface (services and factories are located by another Grid service, called registry, which implements ServiceGroup (SGR) porttype). OGSA-DAI/DQP Both OGSA-DAI and OGSA-DQP build upon OGSA. The main objective of OGSA-DAI/DQP is to provide a uniform service interface for data access and integration over the Grids. OGSA-DAI extends Grid services with new services and porttypes for individual data access, such as Grid Data Service (GDS), Grid Data Transport (GDT), Grid Data Service Factory (GDSF), and DAI Service Group Registry (DAISGR). Grid Data Service is the primary OGSA-DAI service, which supports data access through the GDS porttype (via the perform operation) and data delivery through GDT porttype. GDS instances are created by invoking GDSF which can be located by DAISGR. OGSA-DQP extends OGSA-DAI with two new services (and their corresponding factories) for distributed query processing over multiple data sources: a Grid Distributed Query Query Service (GDQS) which compiles, optimises, partitions and schedules distributed query execution plans over multiple execution nodes in the Grids, and a Grid Query Evaluation Service (GQES) which is in charge of a partition of the query execution plan assigned by a GDQS. In GDQS, a Grid Distributed Query (GDQ) porttype is added for importing source schemas. OGSA-DQP itself does not do any schema mediation.

4 2.2 P2P Technologies P2P technologies share the same final objective as Grid, i.e., to pool large sets of resources, however, they address different requirements and thus have different design approaches. In general, P2P technologies focus more on decentralization and scalability, while Grid technologies focus more on providing various complex services. Three main classes of P2P systems have emerged so far: distributed computing, file sharing, and collaborative, among which, file sharing systems is the most studied. Based on whether there is any constraint on network topology or on data placement, file sharing systems are further classified into two main kinds: unstructured (e.g., [15]) or structured (e.g., [30]). Our work is most related to super-peer networks [39], one kind of unstructured networks, which strikes a balance between the inherent search efficiency of centralized systems and the robustness of decentralized systems. Also, this kind of networks can take advantage of the heterogeneity among the capabilities of participating peers. In a super-peer network, some peers with more capability (e.g., more bandwidth, or CPU) take on role as super-peers, and act as servers to a set of clients (peers with less capability) in the network. A good survey about P2P technologies appears in [24]. 2.3 Data Integration Technologies Data integration technologies have been extensively studied over the last decade, and a lot of work has been done. Traditionally, in the database community, data integration systems are characterized by an architecture based on a global schema and a set of sources, and a crucial aspect in these systems is modelling the relation between the sources and the global schema [20]. Two approaches has been proposed: one is globalas-view (GAV) [34] where the global schema is expressed in terms of the sources; the other is local-as-view (LAV) [17] where each source is defined as a view over the global schema. Regardless of the approach used, during query processing, a query posed over the global schema needs to be reformulated in terms of a set of queries over the sources. A fundamental operation related to modelling is schema matching, where a match operation is a function that takes two schemas as input and returns a match result which includes a set of mapping elements matching elements of one schema to the other schema. [28] gives a survey of approaches to automatic schema matching. However, schemas may have some semantics that affects the matching criteria but is not adequately captured or formally expressed. In such a case, two semantically related schemas may seem unrelated. One solution to this is using ontologies. An ontology is a formal, explicit specification of a shared conceptualization [16]. In particular, ontologies are used to capture some shared understanding of a domain: main concepts and their important relationships. By using ontology-based approaches for data annotation, it becomes easier to achieve semantic integration. A single ontology for all is desirable, but unrealistic. Multiple ontologies may be developed either independently or based on a common upper ontology, and mappings between them need to be established to facilitate interoperability. [26] provides a brief survey of ontology-based approaches, and [19] reviews the state of the art of ontology mapping.

5 Semantics Data interoperability Schema Format System interoperability... Language Platform Remote Exec Method Protocol Fig. 1. The interoperability issue 2.4 Recent Efforts Towards Large-scale Data Sharing Recent efforts on data sharing in heterogeneous environments, in general, seek some kinds of synergies among the above technologies. For example, [29] considers integrating P2P and Grid technologies and realizing a service-oriented ad hoc Grid by implementing P2P-based node discovery, property assessment, and service deployment. [18] considers integrating P2P and data integration technologies and extending traditional data integration by introducing a P2P architecture into schema mediation among data sources. Some like [31, 37, 36, 1] consider integrating Grid and data integration technologies: [31] describes a system that integrates distributed metadata catalogs on the Grid; [37, 36] describe the Grid Data Mediation Service (GDMS) system that presents distributed, heterogeneous data sources as one logical virtual data source on the Grid; and [1] proposes a service-based approach to schema federation which is composed of three services, schema translation, schema matching, and schema mapping services. The main limitation with these proposals is that they largely ignore the extensibility and scalability issues which are important for large-scale data sharing. Some like [3, 5, 4] consider integrating all three technologies, as what we consider in our work, for large-scale data sharing. However, different from our work, they still depend on centralized mechanisms for service discovery, thus limiting their scalability. 3 Main Issues Many issues are involved in large-scale data sharing, such as interoperability, extensibility, scalability, security, and so on. Rather than covering all these issues, in this section, we only focus on a few important issues in order to highlight the necessity for synergies among Grid, P2P, and data integration technologies.

6 3.1 Interoperbility In a distributed environment, data sources may present various degree of heterogeneities that occur across systems and data. In such an environment, one of the most important issues to be addressed is interoperability. We differentiate two kinds of interoperability, system-interoperability and datainteroperability (as shown in Table 1). System-interoperability deals with system heterogeneities such as differences in system platforms (e.g., Windows, Unix), protocols (e.g., http, ftp), remote execution methods (e.g., Java RMI, CORBA), and programming languages, while data-interoperability deals with data heterogeneities such as differences in data formats (e.g., relational databases, XML, flat files), schemas (e.g., same objects are described using different structures or terminologies), and semantics (e.g., object semantics may be captured in different degrees, and domain or expert knowledge is thus needed to relate two objects). By interoperability, we mean that diverse data sources are integrated not only in the system level (to achieve system-interoperability), but also in the data level (to achieve data-interoperability). Grid technologies and infrastructures only address the interoperability issue in a limited scope, with more focus on system-interoperability than data-interoperability. By adopting a uniform service-oriented model, all components of the network are made virtual: resources are represented as Grid services which provide some capability through the exchange of messages using platform and language-neutral protocols over the network. One promising way to improve the interoperability among heterogeneous data sources is to integrate Grid technologies with data integration technologies which have been extensively studied in various communities. Contrary to Grid technologies, data integration technologies focus more on data-interoperability than system-interoperability. In the database community, data integration and exchange between heterogeneous data sources is achieved through a logical mediated schema: mappings are established between the mediated schema and schemas at the data sources; and queries are posed over the mediated schema and evaluated over the underlying data sources. To match different schemas, various kinds of individual matchers, which can be used alone or together, are employed by considering either schema-level or instance-level information, or both. Due to inadequate expressiveness of semantics, domain or expert knowledge may sometimes be resorted in order to glue data sources which are seemingly unconnected [21]. In [22], for example, the mediator approach to data integration is augmented with an explicit representation of domain knowledge in the form of one or more ontologies. Techniques developed in the ontology community for semantic integration can be shared and reused in the database community for more automatic schema matching. 3.2 Extensibility and Scalability Another two important issues to be addressed for large-scale data sharing are extensibility and scalability. Here, extensibility refers to the ability to add or remove data sources with minimal effort, or to easily accommodate changes in data sources, and scalability refers to the ability to adapt to the increased number of data sources. These two

7 issues need to be addressed mainly because of the large-scale and autonomous nature of data sources. Apparently, any centralized solution for coordinating large numbers of autonomous data sources is not desirable, as it cannot deal well with the ad hoc sharing of large numbers of data sources, which may appear, disappear, or change contents. Today s Grid technologies are inadequate in addressing these two issues. Though it is possible to improve the system scalability through their resource management and scheduling, this improved scalability is compromised by the fact that resources in Grids are managed either in a centralized or hierarchical manner, which makes Grid technologies not be able to cope and scale well with large numbers of dynamic data sources. Though more interoperability can be achieved, as indicated earlier, by combining Grid and data integration technologies, the extensibility and scalability issues still persist, if not getting worse: the mediated schema must be designed carefully and globally before any data sharing; data sources cannot change significantly or they might violate the mappings to the mediated schema [18]. In other words, similar to Grid technologies, traditional data integration technologies also neglect the ad hoc extensibility and scalability issues which are important for large-scale data sharing. Thus, both Grid technologies and traditional data integration technologies are insufficient for a large-scale data sharing environment. To address this, and meanwhile, to retain the benefits of the integration of both technologies (e.g., interoperability), we use P2P technologies, and adopt a P2P model for both, which provides the benefits that would otherwise unavailable, i.e., extensibility and scalability. To combine P2P technologies with Grid technologies, we organize resources (or services) in Grids in a P2P manner for scalable service discovery and deployment: each service (peer) is connected to a set of other services (neighbors); given a request, a service first checks whether itself is the required service; if not, it will forward the request to its neighbors, and so on, until the request is satisfied. By doing so, we avoid centralized management of resources. Data integration can also be done in a P2P manner. For example, rather than defining a global mediated schema, we can build semantic mappings directly between schemas of different sources 1. Each data source corresponds to a peer, which maintains a few mappings with other peers (neighbors). Given a query, a peer first transforms the query based on the mappings maintained locally, then forwards the reformulated query to semantically related neighbors. As such, data are integrated through the collaboration between peers. From above analysis, we can see that the integration among Grid, P2P, and data integration technologies can provide much potential towards large-scale data sharing. In the next section, we will describe in detail how these three technologies are integrated together. 4 A Service-oriented Architecture for Large-scale Data Sharing In this section, we present the proposed architecture which is designed specially for large-scale data sharing. The architecture, based on the integration of Grid, P2P, and 1 We may use ontology-based domain knowledge for building semantic mappings. In case multiple ontologies exist, ontology mappings are built first, also in a P2P manner.

8 Application Layer Data Analysis, Simulation,... P2P Layer DAS S DAS M Registry DAS M DAS S Mediation Grid Layer OGSA DQP OGSA DAI Data Layer Data Source X Data Source Y Data Source Z Fig. 2. An architecture for large-scale data sharing data integration technologies, is service-oriented, and abstracted into four layers (as shown in Figure 2): Data layer: this layer consists of a set of autonomous, distributed data sources. Each data source makes its own decision about the system and data, and great heterogeneities may exist among different sources. Grid layer: this layer hides the heterogeneities exposed in the data layer, and presents an uniform view (i.e., services) of all resources to the upper layer. All resources are exposed as Grid services except data resources which are exposed as Data Access Services (DASs). P2P layer: this layer organizes services using P2P models for the support of decentralized service discovery and schema mediation. Note that the distinction between this layer and the Grid layer may not be obvious, and sometimes these two layers may be mixed together (e.g., a Grid service is implemented in a P2P mode). Application layer: this layer performs some data intensive operations, e.g., data analysis possibly spanning over multiple data sources. More details about the architecture are elaborated in the following subsections. 4.1 Data Access Services We differentiate two kinds of data access services, DAS-S for the access of a single data source, and DAS-M for the access spanning multiple data sources. Both DAS-S and

9 DAS-M extend the functionalities of OGSA-DAI and OGSA-DQP. Besides implementing GDS and GDT porttypes from OGSA-DAI, and GDQ porttype from OGSA-DQP, DAS-S and DAS-M extend GridService porttype by adding findneighbors, setneighbors, and findservicedatawithinhops, where findneighbors and setneighbors are used to get and set neighbor services respectively, and findservicedatawithinttl is used to query information about a service within a given Time-To-Live (TTL). Meanwhile, DAS-M introduces additional two new porttypes for the support of distributed data integration, Build Mappings (BM), and Query Reformulation (QR). BM porttype is used for establishing semantic mappings between the mediated schema and the input schemas which are either mediated schemas themselves or source schemas (schemas of data sources), with the help of Schema Matching Services (SMS) 2 for schema matching between the input schemas. QR porttype is used for reformulating a query posed over the mediated schema based on the established mappings 3. Both the functionalities of DAS-M and SMS are encapsulated in the Mediation module in Figure 2. Among others, schemas (source schemas or mediated schemas) are exposed as Service Data Elements (SDEs) by DAS-S and DAS-M. 4.2 Service Organization and Discovery Since all resources in Grids are represented as services, how to discover the required services in a large-scale ad hoc Grid is an issue. As mentioned earlier, the centralized discovery mechanism is not desirable. In this part, we describe a decentralized service discovery mechanism. We employ a super-peer network [39] to organize services. Specifically, each service in Grids corresponds to a peer; for data accesses, a peer may provide data (through its DAS-S service), or act as a mediator (through its DAS-M service); there are two kinds of peers in the network, super-peers and their clients (often called peers directly); a super-peer acts both as a server to the peers within its group (a peer group consists of a super-peer and its peers), and as an equal to other super-peers within the network. When a peer joins a peer group, it registers some service metadata to its super-peer. To find a service, a peer sends a request (through findservicedatawithttl operation) to its super-peer. The super-peer searches its service registry, meanwhile forwards the request to its super-peer neighbors with TTL decreased by 1. The process is repeated until TTL is equal to 0. Usually, services which implement the ServiceGroup porttype act as super-peers. Peer groups are formed based on Virtual Organizations (VOs), with each peer group per VO. However, we may form peer groups based on the semantic closeness of services for efficient service discovery, that is, services which are semantically close are clustered together. We leave this as the future work. 2 As there may be many SMSs, each of which uses different matching methods, SMSs are designed not to be tied to any data access services. 3 Concept-based queries can also be supported when ontologies are used, which requires additional steps in QR for query transformation.

10 SGR Registry 2 2 GS Registry SGR 2 2 GS 3 Factory 1 DASMF GS Registry SGR 2 3 Client 4 QR GDQ DAS M 5 GDQ SMS BM 6 Fig. 3. Service interactions during service initialization and set-up 4.3 Service Interaction In this part, we describe how services are interacted in two typical scenarios: (1) when a data access service is initialized and set up; (2) when a query is submitted. Service Initialization and Set-up Compared to DAS-S, the initialization and set-up of DAS-M is more complex (DAS-S only needs the first a few service interactions of DAS-M). Thus, in the following, we focus on DAS-M only. Figure 3 roughly illustrates the service interactions for the initialization and set-up of DAS-M: (1) a DASMF registers itself with its super-peer (registry) when initialized; (2) a client discovers the DASMF for the service instance creation through the findservicedatawithttl operation to the super-peer; (3) the client creates the DAS-M instance through the createservice operation of the DASMF; (4) the DAS-M imports schemas for mediation through the importschema operation in the GDQ porttype; (5) the SAS imports schemas for matching through the importschema operation; (6) the DAS-M builds mappings via the BM porttype based on the matching result returned from the SAS. Note that, whenever a service is initialized, a service interaction like the above interaction 1 is invoked, and whenever a service instance is created, service interactions like the above interactions 2 and 3 are invoked. Query Processing Suppose a query is posed over a mediated schema which involves two data sources. Figure 4 shows the service interactions during query processing: (1) a query is submit through the perform operation; (2) the DAS-M reformulates the query via the QR porttype based on the mappings built during the set-up; (3) the reformulated query is passed to the GDQS through atheperform operation; (4) the GDQS compiles the query into a distributed query plan, creates a GQES for each partition of the query plan, and passes the corresponding partition to it; (5) each GQES instance starts the

11 GDS GDS GDS 5 GQES GDT 2 Client GDT 1 QR GDS DAS M GDT 3 GDS GDQS GDT GDS GQES GDT 5 GDS GDS 6 Fig. 4. Service interactions during query processing evaluation, and interacts with the GDS; (6) results are propagated back to the client via the GDT porttype. The query formulation work is easy with our architecture, as a client can import schemas of any data access services within its peer group after asking service information from its super-peer (as mentioned in Section 4.2, a super-peer maintains some metadata about the services within its peer group, thus it can have enough knowledge about these services). 5 An Environmental Case Study We begin our work with an environmental case study, which is centered around an existing research and development program in the Queensland Environmental Protection Agency (EPA), WildNet. 5.1 WildNet and Datasets The WildNet database contains 3.5 million records of wildlife sightings and listings of around 20,000 species such as plants, mammals, birds, reptiles, amphibians, freshwater fish, marine cartilaginous fish and butterflies in Queensland. Species are classified by a taxonomy including multiple levels, kingdom names, class names, family names, scientific names, and common names. The main feature of WildNet is that it maintains a large store of ecological data which depends heavily on many other services and is itself a service to many other applications. One fundamental function required by Wild- Net is sightings visualization, i.e., sightings can be visualized on the map for a selected area such as protected area, forestry area, or local government area, or a defined area

12 bounded by minimum and maximum latitudes and longitudes. Beyond this, it is desirable to model species distribution, and identify abnormalities or outliers through the analysis of the sightings data together with the environmental data such as vegetation data or climate profiles. To realize above functionalities, we include the following datasets in our case study: Snake sightings data in southeast Queensland region, provided by the Queensland EPA; Bird sightings data along Queensland coastline, provided by the Queensland EPA; Bird taxonomy data, extracted from Australia Museum via BioMaps [2]; Weather data, downloaded from Australia Bureau of Meteorology. 5.2 The Implementation Architecture The implementation is based on Open Geospatial Consortium (OGC) [27] standards, as data involved in our case study have a geographic or spatial nature. OGC is a nonprofit, international, voluntary consensus standards organization that is leading the development of standards for geospatial and location based services. Currently, the OGC has developed a number of web services specifications that enable the interoperability of geospatial data sources in a distributed environment, such as Web Coverage Services(WCS) [8], Web Feature Services (WFS) [35], Web Map Services (WMS) [6], and Catalogue Services (CSW) [25]. Among them, WFS, WMS, and WCS specify the interfaces for the access to geospatial data sources, and CSW specifies the interfaces through which other services can be published and discovered. Figure 5 shows the implementation architecture. Due to time constraints of this one-year project, current implementation is only OGC-compliant, i.e., data sources are exposed as OGC Web Feature Services. Grid and P2P elements can be included in the future development. In the implementation, we use two machines, A and B: machine A stores snake sightings data, bird taxonomy data, and weather data; machine B stores bird sightings data. A new data source always publishes its services to the catalogue server (step (1)). During a search (step (2)), the catalogue server is first contacted for related data sources (step (3)); then the search request is dispatched to these data sources (step (4)); and finally data are accessed (step (5)). 5.3 Service Development We chose GeoServer [13] for the server-side WFS server development and udig [33] for the client-side interface development. The WFS implementation was developed using Java Servlet technology under Eclipse + Tomcat: basic WFS requests (e.g., getcapability, getfeaturedescription, getfeature) are supported through HTTP GET /POST, and results which are in XML or XML-compliant GML formats are returned, as shown in Figure 6. During the implementation, we found OGC is still immature: there is no open source software for catalogue service development. Also, its specification for catalogue service includes much more details than what we need for our prototype, and much more effort and time need to be invested if we implement a fully OGC-compliant catalogue

13 Client (2) Mediation (3) Catalogue Service (4) (4) WFS (1) WFS (1) (5) (5) Machine A Machine B Fig. 5. The implementation architecture Client HTTP Get/Post GML file Data Resource Access (Servlet) HTTP Get/Post Response WFS Server Fig. 6. The WFS implemenation service from scratch. More importantly, we expect that the Grid registry service can be extended for geospatial data sources in the future development, we decided to implement a simple catalogue service which can meet our current needs but may not follow the OGC specifications. For the catalogue service development, first, we decided the contents of the metadata database. The design details are as follows: each interested feature type is registered into a table (as shown in Figure 7), and each feature attribute of each feature type is registered into a separate table. For each feature type, there is a BBOX attribute, which is a geometric data type (a Polygon) defining the geographic bounding box or the boundary of the feature type. With such a design, we can query the metadata of a feature based on its geographic location or spatial relationship with other feature types. The URL and Namespace decide where to get the feature type online. Next, we built the spatial-compliant metadata database. Two approaches have been tried: the first approach is storing all metadata information in a XML-compliant GML file, and then treat this GML file as a datastore in GeoServer and utilize GeoServer s WFS function to support the spatial query capabilities. This idea comes from deegree catalogue server [7]. However, after testing on GeoServer, we found the result is not ideal due to GeoServer is not well support to GML yet at the moment. The second approach is building a spatial database using PostGIS. PostGIS is a free spatial extension of PostgreSQL allowing building spatial record with friendly user interface. This approach is successful, and we have created a sample metadata database.

14 FeatureType OID Name BBOX Bbstract Keyword URL Namespace 0..m Attribute Name Data type Abstract Feature_Type_ID Fig. 7. A simple metadata database for catalogue service Finally, we implemented the catalogue service (using JAVA). Basically, the catalogue server can receive a catalogue query from a WFS server, decompose the query to the SQL format for searching its metadata database, and finally reconstruct the result and pass back to the WFS server. 5.4 The Prototype The prototype we built has two main functionalities: Selective catalogue search. The catalogue service supports two kinds of search: keyword-based search and location-based search. For keyword-based search, given a keyword, only relevant data sources are returned from the catalogue search. Figure 8 shows the keyword input dialogue. When the input keyword is snake, only the snake source is returned, and when the input keyword is bird, only bird data sources are returned, as shown in Figure 9 and Figure 10 respectively. For locationbased search, we can specify a specific area we are interested in, and only the data sources whose data coverages overlap with the polygon selection are returned, as illustrated in Figure 11. Data sharing based on WFS. Figure 12 to Figure 14 show the data sharing functionality of the prototype. Given a common name, for example, we can get more information about birds with the name, e.g., the taxonomy information, the climate profile, while such information is usually distributed across different data sources. 6 Conclusion and Future Work The main goal of this project is to identify requirements, system architectures, and key technology barriers to establish an ICT infrastructure to support large-scale data resource sharing between research institutions. To achieve this goal, we did the followings: Examined important issues for large-scale data sharing, such as interoperability, extensibility, and scalability;

15 Fig. 8. The dialogue for keyword input Fig. 9. The catalogue search result when input keyword is snake

16 Fig. 10. The catalogue search result when input keyword is bird Fig. 11. The catalogue search result when a polygon selection is created

17 Fig. 12. The dialogue for bird name input Fig. 13. The distribution of Australian White Ibis

18 Fig. 14. The climate profile for Australian White Ibis Analyzed key technologies like Grid, P2P, and data integration technologies, and pointed out synergies among them are necessary to address above issues; Proposed a service-oriented architecture for large-scale data sharing, which is based on the integration of Grid, P2P and data integration technologies; Investigated an environmental case study, and built a working prototype to facilitate geospatial data sharing among several data sources centered around the WildNet database from the Queensland EPA. Future work includes, besides Grid and P2P extension on current prototype development, in-depth research on various topics in data integration area like data cleaning, data reconciliation, automatic semantic mapping building; also, in-depth research on service composition, in-depth research on workflows, and so on. Acknowledgements This material is based upon the project supported by the Australian Research Council (ARC) under grant No. SR We would like to thank Jack Fan Zhang involved for some development, Dr. David Pullar for the environmental data, and the Environmental Protection Agency (EPA) in Queensland for the sightings data. References 1. L. A.-Hussaini, S. Viglas, and M. Atkinson. A service-based approach to schema federation of distributed databases. In Ediburgh E-Science Technical Report, EES , 2006.

19 2. BioMaps D. Calvanese, G. D. Giacomo, M. Lenzerini, R. Rosati, and G. Vetere. Hyper: A framework for peer-to-peer data integration on grids. In Proceedings of the International Conference on Semantics of a Networked World: Semantics for Grid Databases (ICSNW 2004), C. Comito and D. Talia. Data integration and query reformalution in service-based grids: Architecture and roadmap. In CoreGRID Technical Report, TR-0013, C. Comito, D. Talia, and P. Trunfio. Grid services: principles, implementations and use. In International Journal of Web and Grid Services, Vol. 1. No. 1, J. de La Beaujadiere. OpenGIS Web Map Service (WMS) Implemenation Specification, version Deegree J. Evans. OpenGIS Web Coverage Service (WCS) Implemenation Specification, version I. Foster and R. L. Grossman. Data integration in a bandwidth-rich world. In Communications of the ACM, Volume 46, Issue 11, I. Foster, C. Kesselman, J. Nick, and S. Tuecke. The physiology of the grid: An open grid services architecture for distributed systems integration. In Technical Report, Globus Project I. Foster, C. Kesselman, J. Nick, and S. Tuecke. The physiology of the grid: An open grid services architecture for distributed systems integration. In Globus Project, I. Foster, C. Kesselman, and S. Tuecke. The anatomy of the grid: enabling scalable virtual organizations. In International Journal of Supercomputer Applications, 15(3): , GeoServer Globus Toolkit Gnutella T. R. Gruber. A translation approach to portable ontology specifications. In Knowledge Acquisition, 5(2): , A. Y. Halevy. Answering queries using views: A survey. In The VLDB Journal, Volume 10, Issue 4, A. Y. Halevy, Z. G. Ives, D. Suciu, and I. Tatarinov. Schema mediation for large-scale semantic data sharing. In VLDB Journal, Y. Kalfoglou and M. Schorlemmer. Ontology mapping: the state of the art. In The Knowledge Engineering Review, 18(1):1-31, M. Lenzerini. Data integration: a theoretical perspective. In Proceedings of the 21st ACM symposium on Principles of database systems, B. Ludscher, A. Gupta, and M. E. Martonem. A model-based mediator system for scientific data management. In Z. Lacroix and T. Critchlow editors, Bioinformatics: Managing Scientific Data. Morgan Kaufmann, B. Ludscher, K. Lin, S. Bowers, E. Jaeger-Frank, B. Brodaric, and C. Baru. Managing scientific data: From data integration to scientific workflows. In GSA Today, Special Issue on Geoinformatics, D. A. Menasce. Scalable access to scientific data. In IEEE Internet Computing, May/June 2005 (Vol. 9, No. 3), D. S. Milojicic, V. Kalogeraki, R. Lukose, K. Nagaraja, J. Pruyne, B. Richard, S. Rollins, and Z. Xu. Peer-to-peer computing. In HPL R1, D. Nebert and A. Whiteside. OGC Catalogue Services Specifications, version N. F. Noy. Semantic integration: a survey of ontology-based approaches. In ACM SIGMOD Record, 33(4):65-70, 2004.

20 27. Open Geospatial Consortium E. Rahm and P. A. Bernstein. A survey of approaches to automatic schema matching. In VLDB Journal 10: , M. Smith, T. Friese, and B. Freisleben. Towards a service oriented ad-hoc grid. In Proceedings of 3rd International Symposium on Parallel and Distributed Computing, I. Stoica, R. Morris, D. Karger, F. Kaashoek, and H. Balakrishnan. Chord: A scalable peerto-peer lookup service for internet applications. In Proceedings of SIGCOMM, R. Tuchinada, S. Thakkar, Y. Gil, and E. Deelman. Artemis: Integrating scientific data on the grid. In AAAI, S. Tuecke, K. Czajkowski, I. Foster, J. Frey, S. Graham, C. Kesselman, T. Maquire, T.Sandholm, D.Snelling, and P.Vanderbil. Open grid services infrastructure (ogsi) version 1.0. In IGlobal Grid Forum Draft Recommendation, udig J. D. Ullman. Information integration using logical views. In Proceedings of the 6th iinternational Conference on Database Theory (ICDT), P. Vretanos. OpenGIS Web Feature Service (WFS) Implemenation Specification, version A. Whrer, P. Brezany, and A. M. Tjoa. Virtualization of heterogeneous data sources for grid information systems. In MIPRO, A. Whrer, P. Brezany, and A. M. Tjoa. Novel mediator architectures for grid information systems. In Future Generation Computer Systems, 21(1), WSRF B. Yang and H. Garcia-Molina. Designing a super-peer network. In Proceedings of the 18th International Conference on Data Engineering (ICDE), 2003.

A Grid-enabled Architecture for Geospatial Data Sharing

A Grid-enabled Architecture for Geospatial Data Sharing A Grid-enabled Architecture for Geospatial Data Sharing Yanfeng Shu, Jack Fan Zhang, Xiaofang Zhou School of ITEE The University of Queensland {yshu, jfz, zxf}@itee.uq.edu.au Abstract This paper explores

More information

A SECURITY BASED DATA MINING APPROACH IN DATA GRID

A SECURITY BASED DATA MINING APPROACH IN DATA GRID 45 A SECURITY BASED DATA MINING APPROACH IN DATA GRID S.Vidhya, S.Karthikeyan Abstract - Grid computing is the next logical step to distributed computing. Main objective of grid computing is an innovative

More information

A Hybrid Peer-to-Peer Architecture for Global Geospatial Web Service Discovery

A Hybrid Peer-to-Peer Architecture for Global Geospatial Web Service Discovery A Hybrid Peer-to-Peer Architecture for Global Geospatial Web Service Discovery Shawn Chen 1, Steve Liang 2 1 Geomatics, University of Calgary, hschen@ucalgary.ca 2 Geomatics, University of Calgary, steve.liang@ucalgary.ca

More information

WSRF Services for Composing Distributed Data Mining Applications on Grids: Functionality and Performance

WSRF Services for Composing Distributed Data Mining Applications on Grids: Functionality and Performance WSRF Services for Composing Distributed Data Mining Applications on Grids: Functionality and Performance Domenico Talia, Paolo Trunfio, and Oreste Verta DEIS, University of Calabria Via P. Bucci 41c, 87036

More information

Introduction to Grid Technology

Introduction to Grid Technology Introduction to Grid Technology B.Ramamurthy 1 Arthur C Clarke s Laws (two of many) Any sufficiently advanced technology is indistinguishable from magic." "The only way of discovering the limits of the

More information

Weka4WS: a WSRF-enabled Weka Toolkit for Distributed Data Mining on Grids

Weka4WS: a WSRF-enabled Weka Toolkit for Distributed Data Mining on Grids Weka4WS: a WSRF-enabled Weka Toolkit for Distributed Data Mining on Grids Domenico Talia, Paolo Trunfio, Oreste Verta DEIS, University of Calabria Via P. Bucci 41c, 87036 Rende, Italy {talia,trunfio}@deis.unical.it

More information

Survey: Grid Computing and Semantic Web

Survey: Grid Computing and Semantic Web ISSN (Online): 1694-0784 ISSN (Print): 1694-0814 1 Survey: Grid Computing and Semantic Web Belén Bonilla-Morales 1, Xavier Medianero-Pasco 2 and Miguel Vargas-Lombardo 3 1, 2, 3 Technological University

More information

Principles of Dataspaces

Principles of Dataspaces Principles of Dataspaces Seminar From Databases to Dataspaces Summer Term 2007 Monika Podolecheva University of Konstanz Department of Computer and Information Science Tutor: Prof. M. Scholl, Alexander

More information

METAINFORMATION INFRASTRUCTURE FOR GEOSPATIAL INFORMATION

METAINFORMATION INFRASTRUCTURE FOR GEOSPATIAL INFORMATION 2010/2 PAGES 1 7 RECEIVED 15. 6. 2009 ACCEPTED 2. 3. 2010 T. KLIMENT METAINFORMATION INFRASTRUCTURE FOR GEOSPATIAL INFORMATION ABSTRACT Tomáš KLIMENT email: tomas.kliment@stuba.sk Research field: Spatial

More information

Reducing Consumer Uncertainty

Reducing Consumer Uncertainty Spatial Analytics Reducing Consumer Uncertainty Towards an Ontology for Geospatial User-centric Metadata Introduction Cooperative Research Centre for Spatial Information (CRCSI) in Australia Communicate

More information

A P2P Approach for Membership Management and Resource Discovery in Grids1

A P2P Approach for Membership Management and Resource Discovery in Grids1 A P2P Approach for Membership Management and Resource Discovery in Grids1 Carlo Mastroianni 1, Domenico Talia 2 and Oreste Verta 2 1 ICAR-CNR, Via P. Bucci 41 c, 87036 Rende, Italy mastroianni@icar.cnr.it

More information

Research on the Key Technologies of Geospatial Information Grid Service Workflow System

Research on the Key Technologies of Geospatial Information Grid Service Workflow System Research on the Key Technologies of Geospatial Information Grid Service Workflow System Lin Wan *, Zhong Xie, Liang Wu Faculty of Information Engineering China University of Geosciences Wuhan, China *

More information

A Resource Discovery Algorithm in Mobile Grid Computing Based on IP-Paging Scheme

A Resource Discovery Algorithm in Mobile Grid Computing Based on IP-Paging Scheme A Resource Discovery Algorithm in Mobile Grid Computing Based on IP-Paging Scheme Yue Zhang 1 and Yunxia Pei 2 1 Department of Math and Computer Science Center of Network, Henan Police College, Zhengzhou,

More information

A Grid-Enabled Component Container for CORBA Lightweight Components

A Grid-Enabled Component Container for CORBA Lightweight Components A Grid-Enabled Component Container for CORBA Lightweight Components Diego Sevilla 1, José M. García 1, Antonio F. Gómez 2 1 Department of Computer Engineering 2 Department of Information and Communications

More information

WS-Resource Framework: Globus Alliance Perspectives

WS-Resource Framework: Globus Alliance Perspectives : Globus Alliance Perspectives Ian Foster Argonne National Laboratory University of Chicago Globus Alliance www.mcs.anl.gov/~foster Perspectives Why is WSRF important? How does WSRF relate to the Open

More information

Grid Resources Search Engine based on Ontology

Grid Resources Search Engine based on Ontology based on Ontology 12 E-mail: emiao_beyond@163.com Yang Li 3 E-mail: miipl606@163.com Weiguang Xu E-mail: miipl606@163.com Jiabao Wang E-mail: miipl606@163.com Lei Song E-mail: songlei@nudt.edu.cn Jiang

More information

Knowledge Discovery Services and Tools on Grids

Knowledge Discovery Services and Tools on Grids Knowledge Discovery Services and Tools on Grids DOMENICO TALIA DEIS University of Calabria ITALY talia@deis.unical.it Symposium ISMIS 2003, Maebashi City, Japan, Oct. 29, 2003 OUTLINE Introduction Grid

More information

A Replica Location Grid Service Implementation

A Replica Location Grid Service Implementation A Replica Location Grid Service Implementation Mary Manohar, Ann Chervenak, Ben Clifford, Carl Kesselman Information Sciences Institute, University of Southern California Marina Del Rey, CA 90292 {mmanohar,

More information

Grid-Based Data Mining and the KNOWLEDGE GRID Framework

Grid-Based Data Mining and the KNOWLEDGE GRID Framework Grid-Based Data Mining and the KNOWLEDGE GRID Framework DOMENICO TALIA (joint work with M. Cannataro, A. Congiusta, P. Trunfio) DEIS University of Calabria ITALY talia@deis.unical.it Minneapolis, September

More information

Reducing Consumer Uncertainty Towards a Vocabulary for User-centric Geospatial Metadata

Reducing Consumer Uncertainty Towards a Vocabulary for User-centric Geospatial Metadata Meeting Host Supporting Partner Meeting Sponsors Reducing Consumer Uncertainty Towards a Vocabulary for User-centric Geospatial Metadata 105th OGC Technical Committee Palmerston North, New Zealand Dr.

More information

Lupin: from Web Services to Web-based Problem Solving Environments

Lupin: from Web Services to Web-based Problem Solving Environments Lupin: from Web Services to Web-based Problem Solving Environments K. Li, M. Sakai, Y. Morizane, M. Kono, and M.-T.Noda Dept. of Computer Science, Ehime University Abstract The research of powerful Problem

More information

E-Agricultural Services and Business

E-Agricultural Services and Business E-Agricultural Services and Business The Sustainable Web Portal for Observation Data Naiyana Sahavechaphan, Jedsada Phengsuwan, Nattapon Harnsamut Sornthep Vannarat, Asanee Kawtrakul Large-scale Simulation

More information

An agent-based peer-to-peer grid computing architecture

An agent-based peer-to-peer grid computing architecture University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2005 An agent-based peer-to-peer grid computing architecture J. Tang University

More information

Managing Learning Objects in Large Scale Courseware Authoring Studio 1

Managing Learning Objects in Large Scale Courseware Authoring Studio 1 Managing Learning Objects in Large Scale Courseware Authoring Studio 1 Ivo Marinchev, Ivo Hristov Institute of Information Technologies Bulgarian Academy of Sciences, Acad. G. Bonchev Str. Block 29A, Sofia

More information

SEXTANT 1. Purpose of the Application

SEXTANT 1. Purpose of the Application SEXTANT 1. Purpose of the Application Sextant has been used in the domains of Earth Observation and Environment by presenting its browsing and visualization capabilities using a number of link geospatial

More information

Introduction to GT3. Introduction to GT3. What is a Grid? A Story of Evolution. The Globus Project

Introduction to GT3. Introduction to GT3. What is a Grid? A Story of Evolution. The Globus Project Introduction to GT3 The Globus Project Argonne National Laboratory USC Information Sciences Institute Copyright (C) 2003 University of Chicago and The University of Southern California. All Rights Reserved.

More information

A GML SCHEMA MAPPING APPROACH TO OVERCOME SEMANTIC HETEROGENEITY IN GIS

A GML SCHEMA MAPPING APPROACH TO OVERCOME SEMANTIC HETEROGENEITY IN GIS A GML SCHEMA MAPPING APPROACH TO OVERCOME SEMANTIC HETEROGENEITY IN GIS Manoj Paul, S. K. Ghosh School of Information Technology, Indian Institute of Technology, Kharagpur 721302, India - (mpaul, skg)@sit.iitkgp.ernet.in

More information

GMA-PSMH: A Semantic Metadata Publish-Harvest Protocol for Dynamic Metadata Management Under Grid Environment

GMA-PSMH: A Semantic Metadata Publish-Harvest Protocol for Dynamic Metadata Management Under Grid Environment GMA-PSMH: A Semantic Metadata Publish-Harvest Protocol for Dynamic Metadata Management Under Grid Environment Yaping Zhu, Ming Zhang, Kewei Wei, and Dongqing Yang School of Electronics Engineering and

More information

Metadata, Ontologies and Information Models for Grid PSE Toolkits based on Web Services

Metadata, Ontologies and Information Models for Grid PSE Toolkits based on Web Services Metadata, Ontologies and Information Models for Grid PSE Toolkits based on Web Services Carmela Comito 1, Carlo Mastroianni 2 and Domenico Talia 1,2 ABSTRACT: 1 DEIS, University of Calabria, Via P. Bucci

More information

P2P Schema-Mapping over Network-bound XML Data

P2P Schema-Mapping over Network-bound XML Data Fourth International Conference on Semantics, Knowledge and Grid P2P Schema-Mapping over Network-bound XML Data Carmela Comito 1, Domenico Talia 2 DEIS - University of Calabria Via P. Bucci 41 c,87036,

More information

Introduction to Grid Computing

Introduction to Grid Computing Milestone 2 Include the names of the papers You only have a page be selective about what you include Be specific; summarize the authors contributions, not just what the paper is about. You might be able

More information

Learning from Semantically Heterogeneous Data

Learning from Semantically Heterogeneous Data Learning from Semantically Heterogeneous Data Doina Caragea* Department of Computing and Information Sciences Kansas State University 234 Nichols Hall Manhattan, KS 66506 USA voice: +1 785-532-7908 fax:

More information

Scalable Hybrid Search on Distributed Databases

Scalable Hybrid Search on Distributed Databases Scalable Hybrid Search on Distributed Databases Jungkee Kim 1,2 and Geoffrey Fox 2 1 Department of Computer Science, Florida State University, Tallahassee FL 32306, U.S.A., jungkkim@cs.fsu.edu, 2 Community

More information

Geoffrey Fox Community Grids Laboratory Indiana University

Geoffrey Fox Community Grids Laboratory Indiana University s of s of Simple Geoffrey Fox Community s Laboratory Indiana University gcf@indiana.edu s Here we propose a way of describing systems built from Service oriented s in a way that allows one to build new

More information

International Jmynal of Intellectual Advancements and Research in Engineering Computations

International Jmynal of Intellectual Advancements and Research in Engineering Computations www.ijiarec.com ISSN:2348-2079 DEC-2015 International Jmynal of Intellectual Advancements and Research in Engineering Computations VIRTUALIZATION OF DISTIRIBUTED DATABASES USING XML 1 M.Ramu ABSTRACT Objective

More information

Interoperability and eservices

Interoperability and eservices Interoperability and eservices Aphrodite Tsalgatidou and Eleni Koutrouli Department of Informatics & Telecommunications, National & Kapodistrian University of Athens, Greece {atsalga, ekou}@di.uoa.gr Abstract.

More information

ADVANCED GEOGRAPHIC INFORMATION SYSTEMS Vol. II - Geospatial Interoperability : The OGC Perspective Open Geospatial Consortium, Inc.

ADVANCED GEOGRAPHIC INFORMATION SYSTEMS Vol. II - Geospatial Interoperability : The OGC Perspective Open Geospatial Consortium, Inc. GEOSPATIAL INTEROPERABILITY: THE OGC PERSPECTIVE Open Open Geospatial Consortium, Wayland, MA, USA Keywords: geographic information systems, geospatial services, interoperability, interface specification,

More information

Long-term preservation for INSPIRE: a metadata framework and geo-portal implementation

Long-term preservation for INSPIRE: a metadata framework and geo-portal implementation Long-term preservation for INSPIRE: a metadata framework and geo-portal implementation INSPIRE 2010, KRAKOW Dr. Arif Shaon, Dr. Andrew Woolf (e-science, Science and Technology Facilities Council, UK) 3

More information

3.4 Data-Centric workflow

3.4 Data-Centric workflow 3.4 Data-Centric workflow One of the most important activities in a S-DWH environment is represented by data integration of different and heterogeneous sources. The process of extract, transform, and load

More information

THE GLOBUS PROJECT. White Paper. GridFTP. Universal Data Transfer for the Grid

THE GLOBUS PROJECT. White Paper. GridFTP. Universal Data Transfer for the Grid THE GLOBUS PROJECT White Paper GridFTP Universal Data Transfer for the Grid WHITE PAPER GridFTP Universal Data Transfer for the Grid September 5, 2000 Copyright 2000, The University of Chicago and The

More information

METADATA INTERCHANGE IN SERVICE BASED ARCHITECTURE

METADATA INTERCHANGE IN SERVICE BASED ARCHITECTURE UDC:681.324 Review paper METADATA INTERCHANGE IN SERVICE BASED ARCHITECTURE Alma Butkovi Tomac Nagravision Kudelski group, Cheseaux / Lausanne alma.butkovictomac@nagra.com Dražen Tomac Cambridge Technology

More information

Leveraging metadata standards in ArcGIS to support Interoperability. David Danko and Aleta Vienneau

Leveraging metadata standards in ArcGIS to support Interoperability. David Danko and Aleta Vienneau Leveraging metadata standards in ArcGIS to support Interoperability David Danko and Aleta Vienneau Leveraging Metadata Standards in ArcGIS for Interoperability Why metadata and metadata standards? Overview

More information

A New Adaptive, Semantically Clustered Peer-to-Peer Network Architecture

A New Adaptive, Semantically Clustered Peer-to-Peer Network Architecture A New Adaptive, Semantically Clustered Peer-to-Peer Network Architecture 1 S. Das 2 A. Thakur 3 T. Bose and 4 N.Chaki 1 Department of Computer Sc. & Engg, University of Calcutta, India, soumava@acm.org

More information

Collaborative Framework for Testing Web Application Vulnerabilities Using STOWS

Collaborative Framework for Testing Web Application Vulnerabilities Using STOWS Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,

More information

Regular Forum of Lreis. Speechmaker: Gao Ang

Regular Forum of Lreis. Speechmaker: Gao Ang Regular Forum of Lreis Speechmaker: Gao Ang Content: A. Overview of Eclipse Project B. Rich Client Platform C. The progress of ustudio Project D. The development of Grid technology and Grid GIS E. Future

More information

Metadata for Data Discovery: The NERC Data Catalogue Service. Steve Donegan

Metadata for Data Discovery: The NERC Data Catalogue Service. Steve Donegan Metadata for Data Discovery: The NERC Data Catalogue Service Steve Donegan Introduction NERC, Science and Data Centres NERC Discovery Metadata The Data Catalogue Service NERC Data Services Case study:

More information

EarthLookCZ as Czech way to GMES

EarthLookCZ as Czech way to GMES EarthLookCZ as Czech way to GMES Karel Charvat 1 and Petr Horak 1 1 WirelessInfo, Czech Republic, charvat@wirelessinfo.cz Abstract Global Monitoring for Environment and Security is one of 4 ranges of solutions

More information

SDS: A Scalable Data Services System in Data Grid

SDS: A Scalable Data Services System in Data Grid SDS: A Scalable Data s System in Data Grid Xiaoning Peng School of Information Science & Engineering, Central South University Changsha 410083, China Department of Computer Science and Technology, Huaihua

More information

Category Theory in Ontology Research: Concrete Gain from an Abstract Approach

Category Theory in Ontology Research: Concrete Gain from an Abstract Approach Category Theory in Ontology Research: Concrete Gain from an Abstract Approach Markus Krötzsch Pascal Hitzler Marc Ehrig York Sure Institute AIFB, University of Karlsruhe, Germany; {mak,hitzler,ehrig,sure}@aifb.uni-karlsruhe.de

More information

SERVO - ACES Abstract

SERVO - ACES Abstract 1 of 6 12/27/2004 2:33 PM 2 of 6 12/27/2004 2:33 PM Implementing GIS Grid Services for the International Solid Earth Research Virtual Observatory Galip Aydin (1), Marlon Pierce (1), Geoffrey Fox (1), Mehmet

More information

Compass INSPIRE Services. Compass INSPIRE Services. White Paper Compass Informatics Limited Block 8, Blackrock Business

Compass INSPIRE Services. Compass INSPIRE Services. White Paper Compass Informatics Limited Block 8, Blackrock Business Compass INSPIRE Services White Paper 2010 Compass INSPIRE Services Compass Informatics Limited Block 8, Blackrock Business Park, Carysfort Avenue, Blackrock, County Dublin, Ireland Contact Us: +353 1 2104580

More information

Exploiting peer group concept for adaptive and highly available services

Exploiting peer group concept for adaptive and highly available services Computing in High Energy and Nuclear Physics, 24-28 March 2003 La Jolla California 1 Exploiting peer group concept for adaptive and highly available services Muhammad Asif Jan Centre for European Nuclear

More information

Application of UniTESK Technology for Functional Testing of Infrastructural Grid Software

Application of UniTESK Technology for Functional Testing of Infrastructural Grid Software Application of UniTESK Technology for Functional Testing of Infrastructural Grid Software Sergey Smolov ISP RAS ssedai@ispras.ru Abstract In this article some questions of testing of infrastructural Grid

More information

Peer-to-Peer Systems. Chapter General Characteristics

Peer-to-Peer Systems. Chapter General Characteristics Chapter 2 Peer-to-Peer Systems Abstract In this chapter, a basic overview is given of P2P systems, architectures, and search strategies in P2P systems. More specific concepts that are outlined include

More information

SEMANTIC WEB POWERED PORTAL INFRASTRUCTURE

SEMANTIC WEB POWERED PORTAL INFRASTRUCTURE SEMANTIC WEB POWERED PORTAL INFRASTRUCTURE YING DING 1 Digital Enterprise Research Institute Leopold-Franzens Universität Innsbruck Austria DIETER FENSEL Digital Enterprise Research Institute National

More information

ISENS: A System for Information Integration, Exploration, and Querying of Multi-Ontology Data Sources

ISENS: A System for Information Integration, Exploration, and Querying of Multi-Ontology Data Sources ISENS: A System for Information Integration, Exploration, and Querying of Multi-Ontology Data Sources Dimitre A. Dimitrov, Roopa Pundaleeka Tech-X Corp. Boulder, CO 80303, USA Email: {dad, roopa}@txcorp.com

More information

FlowBack: Providing Backward Recovery for Workflow Management Systems

FlowBack: Providing Backward Recovery for Workflow Management Systems FlowBack: Providing Backward Recovery for Workflow Management Systems Bartek Kiepuszewski, Ralf Muhlberger, Maria E. Orlowska Distributed Systems Technology Centre Distributed Databases Unit ABSTRACT The

More information

NextData System of Systems Infrastructure (ND-SoS-Ina)

NextData System of Systems Infrastructure (ND-SoS-Ina) NextData System of Systems Infrastructure (ND-SoS-Ina) DELIVERABLE D2.3 (CINECA, CNR-IIA) - Web Portal Architecture DELIVERABLE D4.1 (CINECA, CNR-IIA) - Test Infrastructure Document identifier: D2.3 D4.1

More information

Introduction. Software Trends. Topics for Discussion. Grid Technology. GridForce:

Introduction. Software Trends. Topics for Discussion. Grid Technology. GridForce: GridForce: A Multi-tier Approach to Prepare our Workforce for Grid Technology Bina Ramamurthy CSE Department University at Buffalo (SUNY) 201 Bell Hall, Buffalo, NY 14260 716-645-3180 (108) bina@cse.buffalo.edu

More information

FedX: A Federation Layer for Distributed Query Processing on Linked Open Data

FedX: A Federation Layer for Distributed Query Processing on Linked Open Data FedX: A Federation Layer for Distributed Query Processing on Linked Open Data Andreas Schwarte 1, Peter Haase 1,KatjaHose 2, Ralf Schenkel 2, and Michael Schmidt 1 1 fluid Operations AG, Walldorf, Germany

More information

DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI

DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI Department of Computer Science and Engineering CS6703 Grid and Cloud Computing Anna University 2 & 16 Mark Questions & Answers Year / Semester: IV / VII Regulation:

More information

GT-OGSA Grid Service Infrastructure

GT-OGSA Grid Service Infrastructure Introduction to GT3 Background The Grid Problem The Globus Approach OGSA & OGSI Globus Toolkit GT3 Architecture and Functionality: The Latest Refinement of the Globus Toolkit Core Base s User-Defined s

More information

Grid Computing. Lectured by: Dr. Pham Tran Vu Faculty of Computer and Engineering HCMC University of Technology

Grid Computing. Lectured by: Dr. Pham Tran Vu   Faculty of Computer and Engineering HCMC University of Technology Grid Computing Lectured by: Dr. Pham Tran Vu Email: ptvu@cse.hcmut.edu.vn 1 Grid Architecture 2 Outline Layer Architecture Open Grid Service Architecture 3 Grid Characteristics Large-scale Need for dynamic

More information

Assignment 5. Georgia Koloniari

Assignment 5. Georgia Koloniari Assignment 5 Georgia Koloniari 2. "Peer-to-Peer Computing" 1. What is the definition of a p2p system given by the authors in sec 1? Compare it with at least one of the definitions surveyed in the last

More information

Grid Computing Systems: A Survey and Taxonomy

Grid Computing Systems: A Survey and Taxonomy Grid Computing Systems: A Survey and Taxonomy Material for this lecture from: A Survey and Taxonomy of Resource Management Systems for Grid Computing Systems, K. Krauter, R. Buyya, M. Maheswaran, CS Technical

More information

Enabling Seamless Sharing of Data among Organizations Using the DaaS Model in a Cloud

Enabling Seamless Sharing of Data among Organizations Using the DaaS Model in a Cloud Enabling Seamless Sharing of Data among Organizations Using the DaaS Model in a Cloud Addis Mulugeta Ethiopian Sugar Corporation, Addis Ababa, Ethiopia addismul@gmail.com Abrehet Mohammed Omer Department

More information

An Eclipse-based Environment for Programming and Using Service-Oriented Grid

An Eclipse-based Environment for Programming and Using Service-Oriented Grid An Eclipse-based Environment for Programming and Using Service-Oriented Grid Tianchao Li and Michael Gerndt Institut fuer Informatik, Technische Universitaet Muenchen, Germany Abstract The convergence

More information

On-Line Monitoring of Multi-Area Power Systems in Distributed Environment

On-Line Monitoring of Multi-Area Power Systems in Distributed Environment SERBIAN JOURNAL OF ELECTRICAL ENGINEERING Vol. 3, No. 1, June 2006, 89-101 On-Line Monitoring of Multi-Area Power Systems in Distributed Environment Ramadoss Ramesh 1, Velimuthu Ramachandran 2 Abstract:

More information

OPAX - An Open Peer-to-Peer Architecture for XML Message Exchange

OPAX - An Open Peer-to-Peer Architecture for XML Message Exchange OPAX - An Open Peer-to-Peer Architecture for XML Message Exchange Bernhard Schandl, University of Vienna bernhard.schandl@univie.ac.at Users wishing to find multimedia material about interesting events

More information

Research on the Interoperability Architecture of the Digital Library Grid

Research on the Interoperability Architecture of the Digital Library Grid Research on the Interoperability Architecture of the Digital Library Grid HaoPan Department of information management, Beijing Institute of Petrochemical Technology, China, 102600 bjpanhao@163.com Abstract.

More information

Recommendations of the ad-hoc XML Working Group To the CIO Council s EIEIT Committee May 18, 2000

Recommendations of the ad-hoc XML Working Group To the CIO Council s EIEIT Committee May 18, 2000 Recommendations of the ad-hoc XML Working Group To the CIO Council s EIEIT Committee May 18, 2000 Extensible Markup Language (XML) is being widely implemented and holds great potential to enhance interoperability

More information

Research on Firewall in Software Defined Network

Research on Firewall in Software Defined Network Advances in Computer, Signals and Systems (2018) 2: 1-7 Clausius Scientific Press, Canada Research on Firewall in Software Defined Cunqun Fan a, Manyun Lin, Xiangang Zhao, Lizi Xie, Xi Zhang b,* National

More information

GWD-I (draft-ggf-dais -dataservices-01) Data Access and Integration Services (DAIS) -wg J.

GWD-I (draft-ggf-dais -dataservices-01) Data Access and Integration Services (DAIS)  -wg J. GWD-I (draft-ggf-dais -dataservices-01) Access and Integration Services (DAIS) http://forge.ggf.org/projects/dais -wg Editors: I. Foster, ANL S. Tuecke, ANL J. Unger, IBM August 14, 2003 OGSA Services

More information

IT Infrastructure for BIM and GIS 3D Data, Semantics, and Workflows

IT Infrastructure for BIM and GIS 3D Data, Semantics, and Workflows IT Infrastructure for BIM and GIS 3D Data, Semantics, and Workflows Hans Viehmann Product Manager EMEA ORACLE Corporation November 23, 2017 @SpatialHannes Safe Harbor Statement The following is intended

More information

CSE 5306 Distributed Systems. Course Introduction

CSE 5306 Distributed Systems. Course Introduction CSE 5306 Distributed Systems Course Introduction 1 Instructor and TA Dr. Donggang Liu @ CSE Web: http://ranger.uta.edu/~dliu Email: dliu@uta.edu Phone: 817-2720741 Office: ERB 555 Office hours: Tus/Ths

More information

Agent-Enabling Transformation of E-Commerce Portals with Web Services

Agent-Enabling Transformation of E-Commerce Portals with Web Services Agent-Enabling Transformation of E-Commerce Portals with Web Services Dr. David B. Ulmer CTO Sotheby s New York, NY 10021, USA Dr. Lixin Tao Professor Pace University Pleasantville, NY 10570, USA Abstract:

More information

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification. Eva Klien (FHG), Christine Giger (ETHZ), Dániel Kristóf (FOMI)

A5.2-D3 [3.5] Workflow Design and Construction Service Component Specification. Eva Klien (FHG), Christine Giger (ETHZ), Dániel Kristóf (FOMI) Title: A5.2-D3 [3.0] A Lightweight Introduction to the HUMBOLDT Framework V3.0 Author(s)/Organisation(s): Daniel Fitzner (FhG), Thorsten Reitz (FhG) Working Group: Architecture Team / WP5 References: A5.2-D3

More information

Ontology-Based Schema Integration

Ontology-Based Schema Integration Ontology-Based Schema Integration Zdeňka Linková Institute of Computer Science, Academy of Sciences of the Czech Republic Pod Vodárenskou věží 2, 182 07 Prague 8, Czech Republic linkova@cs.cas.cz Department

More information

Grid Services and the Globus Toolkit

Grid Services and the Globus Toolkit Grid Services and the Globus Toolkit Lisa Childers childers@mcs.anl.gov The Globus Alliance Copyright (C) 2003 University of Chicago and The University of Southern California. All Rights Reserved. This

More information

Grid Infrastructure Monitoring Service Framework Jiro/JMX Based Implementation

Grid Infrastructure Monitoring Service Framework Jiro/JMX Based Implementation URL: http://www.elsevier.nl/locate/entcs/volume82.html 12 pages Grid Infrastructure Monitoring Service Framework Jiro/JMX Based Implementation Bartosz Lawniczek, Grzegorz Majka, Pawe l S lowikowski, Krzysztof

More information

Bruce Wright, John Ward, Malcolm Field, Met Office, United Kingdom

Bruce Wright, John Ward, Malcolm Field, Met Office, United Kingdom The Met Office s Logical Store Bruce Wright, John Ward, Malcolm Field, Met Office, United Kingdom Background are the lifeblood of the Met Office. However, over time, the organic, un-governed growth of

More information

AOTO: Adaptive Overlay Topology Optimization in Unstructured P2P Systems

AOTO: Adaptive Overlay Topology Optimization in Unstructured P2P Systems AOTO: Adaptive Overlay Topology Optimization in Unstructured P2P Systems Yunhao Liu, Zhenyun Zhuang, Li Xiao Department of Computer Science and Engineering Michigan State University East Lansing, MI 48824

More information

Joining the BRICKS Network - A Piece of Cake

Joining the BRICKS Network - A Piece of Cake Joining the BRICKS Network - A Piece of Cake Robert Hecht and Bernhard Haslhofer 1 ARC Seibersdorf research - Research Studios Studio Digital Memory Engineering Thurngasse 8, A-1090 Wien, Austria {robert.hecht

More information

PortalU, a Tool to Support the Implementation of the Shared Environmental Information System (SEIS) in Germany

PortalU, a Tool to Support the Implementation of the Shared Environmental Information System (SEIS) in Germany European conference of the Czech Presidency of the Council of the EU TOWARDS eenvironment Opportunities of SEIS and SISE: Integrating Environmental Knowledge in Europe http:/www.e-envi2009.org/proceedings/

More information

DataONE: Open Persistent Access to Earth Observational Data

DataONE: Open Persistent Access to Earth Observational Data Open Persistent Access to al Robert J. Sandusky, UIC University of Illinois at Chicago The Net Partners Update: ONE and the Conservancy December 14, 2009 Outline NSF s Net Program ONE Introduction Motivating

More information

Implementing the Army Net Centric Data Strategy in a Service Oriented Environment

Implementing the Army Net Centric Data Strategy in a Service Oriented Environment Implementing the Army Net Centric Strategy in a Service Oriented Environment Michelle Dirner Army Net Centric Strategy (ANCDS) Center of Excellence (CoE) Service Team Lead RDECOM CERDEC SED in support

More information

ERDAS APOLLO Managing and Serving Geospatial Information

ERDAS APOLLO Managing and Serving Geospatial Information ERDAS APOLLO Managing and Serving Geospatial Information ERDAS APOLLO Do you have large volumes of geospatial information, regularly updated data stores, and a distributed user base? Do you need a single,

More information

Lily: Ontology Alignment Results for OAEI 2009

Lily: Ontology Alignment Results for OAEI 2009 Lily: Ontology Alignment Results for OAEI 2009 Peng Wang 1, Baowen Xu 2,3 1 College of Software Engineering, Southeast University, China 2 State Key Laboratory for Novel Software Technology, Nanjing University,

More information

Grid Computing with Voyager

Grid Computing with Voyager Grid Computing with Voyager By Saikumar Dubugunta Recursion Software, Inc. September 28, 2005 TABLE OF CONTENTS Introduction... 1 Using Voyager for Grid Computing... 2 Voyager Core Components... 3 Code

More information

A Time-To-Live Based Reservation Algorithm on Fully Decentralized Resource Discovery in Grid Computing

A Time-To-Live Based Reservation Algorithm on Fully Decentralized Resource Discovery in Grid Computing A Time-To-Live Based Reservation Algorithm on Fully Decentralized Resource Discovery in Grid Computing Sanya Tangpongprasit, Takahiro Katagiri, Hiroki Honda, Toshitsugu Yuba Graduate School of Information

More information

Developing a Grid-Based Search and Categorization Tool

Developing a Grid-Based Search and Categorization Tool Abstract: High Energy Physics Libraries Webzine Issue 8 / October 2003 Developing a Grid-Based Search and Categorization Tool Glenn Haya (*), Frank Scholze (*), Jens Vigen (*) Grid technology has the potential

More information

UNICORE Globus: Interoperability of Grid Infrastructures

UNICORE Globus: Interoperability of Grid Infrastructures UNICORE : Interoperability of Grid Infrastructures Michael Rambadt Philipp Wieder Central Institute for Applied Mathematics (ZAM) Research Centre Juelich D 52425 Juelich, Germany Phone: +49 2461 612057

More information

Designing a System Engineering Environment in a structured way

Designing a System Engineering Environment in a structured way Designing a System Engineering Environment in a structured way Anna Todino Ivo Viglietti Bruno Tranchero Leonardo-Finmeccanica Aircraft Division Torino, Italy Copyright held by the authors. Rubén de Juan

More information

Relation between Geospatial information projects related to GBIF

Relation between Geospatial information projects related to GBIF Relation between Geospatial information projects related to GBIF Synthesys 3.6-Synthesys 3.7-GBIF.DE- BioGeomancer The most up to date work can always be found at: http://www.biogeografia.com/synthesys

More information

For each use case, the business need, usage scenario and derived requirements are stated. 1.1 USE CASE 1: EXPLORE AND SEARCH FOR SEMANTIC ASSESTS

For each use case, the business need, usage scenario and derived requirements are stated. 1.1 USE CASE 1: EXPLORE AND SEARCH FOR SEMANTIC ASSESTS 1 1. USE CASES For each use case, the business need, usage scenario and derived requirements are stated. 1.1 USE CASE 1: EXPLORE AND SEARCH FOR SEMANTIC ASSESTS Business need: Users need to be able to

More information

The GeoPortal Cookbook Tutorial

The GeoPortal Cookbook Tutorial The GeoPortal Cookbook Tutorial Wim Hugo SAEON/ SAEOS SCOPE OF DISCUSSION Background and Additional Resources Context and Concepts The Main Components of a GeoPortal Architecture Implementation Options

More information

Grid Computing Fall 2005 Lecture 5: Grid Architecture and Globus. Gabrielle Allen

Grid Computing Fall 2005 Lecture 5: Grid Architecture and Globus. Gabrielle Allen Grid Computing 7700 Fall 2005 Lecture 5: Grid Architecture and Globus Gabrielle Allen allen@bit.csc.lsu.edu http://www.cct.lsu.edu/~gallen Concrete Example I have a source file Main.F on machine A, an

More information

EarthCube and Cyberinfrastructure for the Earth Sciences: Lessons and Perspective from OpenTopography

EarthCube and Cyberinfrastructure for the Earth Sciences: Lessons and Perspective from OpenTopography EarthCube and Cyberinfrastructure for the Earth Sciences: Lessons and Perspective from OpenTopography Christopher Crosby, San Diego Supercomputer Center J Ramon Arrowsmith, Arizona State University Chaitan

More information

Wide Area Query Systems The Hydra of Databases

Wide Area Query Systems The Hydra of Databases Wide Area Query Systems The Hydra of Databases Stonebraker et al. 96 Gribble et al. 02 Zachary G. Ives University of Pennsylvania January 21, 2003 CIS 650 Data Sharing and the Web The Vision A World Wide

More information

Semantic Web Mining and its application in Human Resource Management

Semantic Web Mining and its application in Human Resource Management International Journal of Computer Science & Management Studies, Vol. 11, Issue 02, August 2011 60 Semantic Web Mining and its application in Human Resource Management Ridhika Malik 1, Kunjana Vasudev 2

More information