Methodological approach for cross-theme harmonization of Polish spatial data sets the case study for the Annex I themes Elżbieta Bielecka, Agnieszka Zwirowicz-Rutkowska, Alina Kmiecik, Marek Brylski, Magdalena Bednarek &
PRESENTATION SCOPE Project scope & organization Interoperability components in focus Data harmonization process applied Feedback from use-cases: Administrative Units, Addresses, Transport Networks, Summary
PROJECT GENERAL INFORMATION WHAT: Harmonisation of spatial data and services for geoportal2 project WHY: Obtain INSPIRE compliant data for Annex I themes which are in responsibility of Head Office of Geodesy and Cartography WHEN: 14 months (December 2010 February 2012) WHO: Institute of Geodesy and Cartography, in cooperation with Intergraph INSPIRE THEMES IN FOCUS CRS: Coordinate Reference Systems GGS: Geographical Grid Systems GN: Geographical names AU: Administrative Units CP: Cadastral parcels AD: Addresses TN: Transport networks
WORK ORGANIZATION PACKAGE 1: PROJECT MANAGEMENT PACKAGE 2: STRATEGY FOR DATA HARMONISATION PACKAGE 3: TESTING LABORATORY PACKAGE 6: METADATA PROFILE PACKAGE 4: TOOLS FOR HARMONIZATION PACKAGE 7: TOOLS FOR METADATA PACKAGE 5: HARMONIZED DATA PACKAGE 8: METADATA FOR HARMONIZED DATASETS 24/06/2010 INSPIRE Conference - Kraków 4
HARMONISATION SURROUNDINGS FORMAL ISSUES What themes are in focus? Where are the cross-theme correspondances? Which organisation is responsible for theme? Which Data Providers need to be involved? Are aggreements on data sharring/exchange needed? Which organisation is Data integrator? 30/06/2011 INSPIRE Conference - Edinburgh 5
HARMONISATION SURROUNDINGS ORGANISATIONAL ISSUES Which data resources are relevant for theme? Are harmonizaed data a new dataset or just another encoding? Where the harmonized dataset reside? What update cycle is needed? What are the costs of harmonization process? 30/06/2011 INSPIRE Conference - Edinburgh 6
HARMONISATION SURROUNDINGS TECHNICAL ISSUES Do we need a new data store or transform on-thefly? Should be data corrected during the process? What are the common harmonization rules? What is an optimum harmonization process for theme? What are the process quality measures? How to ensure repeatability of the process? What are the publishing time slots for related themes? What are the well known namespaces for INSPIRE objects? Can everyone transform the data? Where are located the application schemas and codelists? 30/06/2011 INSPIRE Conference - Edinburgh Do we need local registers? What technical support / environment is needed? 7
INTEROPERABILITY COMPONENTS (A) PRINCIPLES (B) TERMINOLOGY (C) REFERENCE MODEL INSPIRE LEVEL (D) RULES FOR APPLICATION SCHEMAS AND FEATURE CATALOGUES (G) COORDINATE REFERENCING AND UNITS OF MEASUREMENT (E) SPATIAL AND TEMPORAL ASPECTS (H) OBJECT REFERNCING MODELING (F) MULTI-LINGUAL TEXT AND CULTURAL ADAPTABILITY (K) PORTRAYAL MODEL NATIONALL EVEL (L) REGISTERS AND REGISTRIES (P) DATA TRANSFER (I) IDENTIFIER MANAGEMENT (R) MULTIPLE REPRESENTATIONS (J) DATA TRANSFORMATION THEME LEVEL (M) METADATA (N) MAINTENANCE (O) DATA AND INFORMATION QUALITY (S) DATA CAPTURING RULES (Q) CONSISTENCY BETWEEN DATA (T) CONFORMANCE INSPIRE Conference - Kraków 8
DATA HARMONISATION PROCESS ENVIRONMENTAL, TECHNICAL & ORGANISATIONAL ISSUES PRODUCTION ENVIRONEMNT IDENTIFICATION OF SOURCE DATA RESOURCE I RESOURCE II... HARMONISATION I DATA CAPTURING RULES DATA TRANSFORMATION DATA QUALITY HARMONISED DATA HARMONISATION II MAINTENANCE DATA TRANSFER CONFORMANCE METADATA AVAILABILITY ENVIRONMENT DATA PROCESSING DATA INTEGRATION CONSISTENCY BETWEEN DATA IDENTIFIERS MANAGEMENT REGISTERS AND REGISTRIES HARMONISATION III PORTRAYAL MODEL MULTIPLE REPRESENTATIONS CONFORMANCE INSPIRE CONFORMANT DATA FOR THEME metadata GML DATA PUBLICATION VIEW SERVICE DIRECT DATA ACCESS DOWNLOAD SERVICE DOWNLOAD SERVICE DIOVERY SERVICE MONIITORING AND REPORTING OF NATIONAL INFRASTRUCTURE
DATA HARMONISATION PROCESS ENVIRONMENTAL, TECHNICAL & ORGANISATIONAL ISSUES PRODUCTION ENVIRONEMNT IDENTIFICATION OF SOURCE DATA RESOURCE I RESOURCE II... HARMONISATION I DATA CAPTURING RULES DATA TRANSFORMATION DATA QUALITY HARMONISED DATA HARMONISATION II MAINTENANCE DATA TRANSFER CONFORMANCE METADATA AVAILABILITY ENVIRONMENT DATA PROCESSING DATA INTEGRATION CONSISTENCY BETWEEN DATA IDENTIFIERS MANAGEMENT REGISTERS AND REGISTRIES HARMONISATION III PORTRAYAL MODEL MULTIPLE REPRESENTATIONS CONFORMANCE INSPIRE CONFORMANT DATA FOR THEME metadata GML DATA PUBLICATION VIEW SERVICE DIRECT DATA ACCESS DOWNLOAD SERVICE DOWNLOAD SERVICE DIOVERY SERVICE MONIITORING AND REPORTING OF NATIONAL INFRASTRUCTURE
IDENTIFICATION OF SOURCE DATA THEME Coordinate Reference Systems Geographical Grid Systems Geographical Names Administrative Units Addresses Cadastral parcels Transport networks DATA SOURCE N/A N/A National Registry of Geographical Names National Register of Administrative Boundaries Topographical Database Land Parcel Identification System Topographical Database General Geographical Database 250 K
DATA HARMONISATION PROCESS ENVIRONMENTAL, TECHNICAL & ORGANISATIONAL ISSUES PRODUCTION ENVIRONEMNT IDENTIFICATION OF SOURCE DATA RESOURCE I RESOURCE II... HARMONISATION I DATA CAPTURING RULES DATA TRANSFORMATION DATA QUALITY HARMONISED DATA HARMONISATION II MAINTENANCE DATA TRANSFER CONFORMANCE METADATA AVAILABILITY ENVIRONMENT DATA PROCESSING DATA INTEGRATION CONSISTENCY BETWEEN DATA IDENTIFIERS MANAGEMENT REGISTERS AND REGISTRIES HARMONISATION III PORTRAYAL MODEL MULTIPLE REPRESENTATIONS CONFORMANCE INSPIRE CONFORMANT DATA FOR THEME metadata GML DATA PUBLICATION VIEW SERVICE DIRECT DATA ACCESS DOWNLOAD SERVICE DOWNLOAD SERVICE DIOVERY SERVICE MONIITORING AND REPORTING OF NATIONAL INFRASTRUCTURE
DATA PROCESSING - PROCEDURES IDENTIFIED SOURCE DATASETS FOR THEME NO MORE THAN ONE DATASET FOR THEME? YES PROCEDURE FOR DATA HARMONIZATIOIN DATASET CHARACTERISTICS TARGET MODEL CHARACTERISTICS for each dataset PROCEDURE FOR DATA INTEGRATION CHARACTERISTICS OF RELATIONSHIPS BETWEEN THE DATASETS DATA CAPTURING RULES CONFORMANCE IDENTIFICATION OF TODO ACTIONS DATA QUALITY METADATA DATA TRANSFORMATION MAINTENANCE cross-theme integration IDENTIFICATION OF TODO ACTIONS REGISTERS AND REGISTRIES IDENTIFIERS MANAGEMENT CONSISTENCY BETWEEN DATA SELECTION OF HARMONIZATION RULES ASELECTION OF INTEGRATION RULES SET ORDER OF ACTIONS APPLICATION OF HARMONIZATION AND INTEGRATION RULES
DATA HARMONISATION PROCESS ENVIRONMENTAL, TECHNICAL & ORGANISATIONAL ISSUES PRODUCTION ENVIRONEMNT IDENTIFICATION OF SOURCE DATA RESOURCE I RESOURCE II... HARMONISATION I DATA CAPTURING RULES DATA TRANSFORMATION DATA QUALITY HARMONISED DATA HARMONISATION II MAINTENANCE DATA TRANSFER CONFORMANCE METADATA AVAILABILITY ENVIRONMENT DATA PROCESSING DATA INTEGRATION CONSISTENCY BETWEEN DATA IDENTIFIERS MANAGEMENT REGISTERS AND REGISTRIES HARMONISATION III PORTRAYAL MODEL MULTIPLE REPRESENTATIONS CONFORMANCE INSPIRE CONFORMANT DATA FOR THEME metadata GML DATA PUBLICATION VIEW SERVICE DIRECT DATA ACCESS DOWNLOAD SERVICE DOWNLOAD SERVICE DIOVERY SERVICE MONIITORING AND REPORTING OF NATIONAL INFRASTRUCTURE
HARMONISATION ENVIRONMENT USERS THEMATIC EXPERT DATA PROCESOR SYSTEM ADMINISTRATOR HARMONISED DATA Status -PUBLISHED GML publ. WMS publ. WFS publ. Online HARMONISED DATA Status- ONGOING
HARMONISATION ENVIRONMENT USERS THEMATIC EXPERT DATA PROCESOR SYSTEM ADMINISTRATOR HARMONISED DATA Status -ONGOING GML WMS WFS HARMONISED DATA Status- PUBLISHED Online
HARMONISATION ENVIRONMENT SOURCE DATA QUALITY CONTROL HARMONIZED DATA QUALITY CONTROL QUALITY CONTROL OF GML FILES HARMONISED DATA Status -ONGOING GML WMS WFS Testing HARMONISED DATA Status- PUBLISHED GML WMS WFS Online
EXTRACT FROM USE-CASES Administrative units, Addresses, Transport Networks
ISSUE 1: TARGET INSPIRE SCHEMAS USE-CASE: Administrative units, Addresses, Transport Networks Are attributes mandatory or optional, many or one? data specifiaction guidelines and GML application schemas define some attributes as optional but UE 1089/2010 regulation does not Which elements do we need to provide to be INSPIRE compliant? class ov erv i... AU sample: Condominium Has Poland any condominium or is condominium of another country?. What is our <<voidable>> case: unknown: we don t know we are? unpopulated: we know but don t say it? «featuretype» Administrativ eboundary + geometry: GM_Curve + inspireid: Identifier +boundary + country: CountryCode + nationallevel: AdministrativeHierarchyLevel [1..6] 1..* + legalstatus: LegalStatusValue = "agreed" + technicalstatus: TechnicalStatusValue = "edge-matched" «voidable, lifecycleinfo» + beginlifespanversion: DateTime + endlifespanversion: DateTime [0..1] +coadminister +administeredby 0..* 0..* «featuretype» Administrativ eunit + geometry: GM_MultiSurface +upperlevelunit + nationalcode: CharacterString + inspireid: Identifier 0..1 + nationallevel: AdministrativeHierarchyLevel +admunit + country: CountryCode + name: GeographicalName [1..*] 1..* + nationallevelname: LocalisedCharacterString [1..*] + residenceofauthority: ResidenceOfAuthority [1..*] «voidable, lifecycleinfo» + beginlifespanversion: DateTime +lowerlevelunit + endlifespanversion: DateTime [0..1] 0..* constraints {CondominiumsAtCountryLevel} {AdmininstrativeUnitHighestLevel} {AdministrativeUnitLowestLevel} +admunit 1..* +NUTS +condominium 0..* 1..3 Is one feature enough for the theme? «featuretype» Condominium + inspireid: Identifier + geometry: GM_MultiSurface + name: GeographicalName [0..*] «voidable, lifecycleinfo» + beginlifespanversion: DateTime + endlifespanversion: DateTime [0..1] «featuretype» NUTSRegion + geometry: GM_MultiSurface + inspireid: Identifier + NUTSCode: CharacterString «voidable, lifecycleinfo» + beginlifespanversion: DateTime + endlifespanversion: DateTime [0..1] 24/06/2010 INSPIRE Conference - Kraków 19
ISSUE 2: TARGET INSPIRE SCHEMAS USE-CASE: Administrative units, Addresses, Transport Networks Do we need to provide placeholders? Placeholders represent feature types to be defined by other themes. GML application schemas include placeholder feature type definitions. Shall we deliver them or ommit them? class ov erv i... AU sample: NUTSRegion NUTSRegion is part of Statistical Units theme but is also placeholder of AU application schema. Available NUTSRegions refer year 2006 whereas administrative units are most recent ones. What is our <<voidable>> case? +coadminister +administeredby 0..* 0..* «featuretype» Administrativ eunit «featuretype» + geometry: GM_MultiSurface +upperlevelunit Administrativ eboundary + nationalcode: CharacterString + inspireid: Identifier 0..1 + geometry: GM_Curve + nationallevel: AdministrativeHierarchyLevel + inspireid: Identifier +boundary +admunit + country: CountryCode + country: CountryCode + name: GeographicalName [1..*] + nationallevel: AdministrativeHierarchyLevel [1..6] 1..* 1..* + nationallevelname: LocalisedCharacterString [1..*] + legalstatus: LegalStatusValue = "agreed" + residenceofauthority: ResidenceOfAuthority [1..*] + technicalstatus: TechnicalStatusValue = "edge-matched" «voidable, lifecycleinfo» «voidable, lifecycleinfo» + beginlifespanversion: DateTime +lowerlevelunit + beginlifespanversion: DateTime + endlifespanversion: DateTime [0..1] + endlifespanversion: DateTime [0..1] 0..* constraints {CondominiumsAtCountryLevel} {AdmininstrativeUnitHighestLevel} {AdministrativeUnitLowestLevel} +admunit 1..* +condominium +NUTS 0..* 1..3 «featuretype» «featuretype» Condominium NUTSRegion + inspireid: Identifier + geometry: GM_MultiSurface + geometry: GM_MultiSurface + inspireid: Identifier + NUTSCode: CharacterString + name: GeographicalName [0..*] «voidable, lifecycleinfo» «voidable, lifecycleinfo» + beginlifespanversion: DateTime + beginlifespanversion: DateTime + endlifespanversion: DateTime [0..1] + endlifespanversion: DateTime [0..1] 24/06/2010 INSPIRE Conference - Kraków 20
ISSUE 3: NATIONAL SOURCE DATA ARE NOT ENOUGH? USE-CASE: Administrative units, Addresses, Transport Networks Are the data INSPIRE compliant if the features on boundaries are not coherent? INSPIRE Directive, art. 10 point 2: In order to ensure that spatial data relating to a geographical feature, the location of which spans the frontier between two or more Member States, are coherent, Member States shall, where appropriate, decide by mutual consent on the depiction and position of such common features. 24/06/2010 INSPIRE Conference - Kraków 21
ISSUE 4: SELECTED SOURCE DATA ARE NOT ENOUGH? USE-CASE: Administrative units, Addresses, Transport Networks Can we use the external publicly available source for preparing INSPIRE datasets? External data resources are needed to fill the feature attributes, AU sample: NUTSRegion EUROSTAT data on NUTS AD sample: Address postal codes from Polish Post Offices 24/06/2010 INSPIRE Conference - Kraków 22
ISSUE 5: NUMBER OF SOURCE DATASETS USE-CASE: Administrative units, Addresses, Transport Networks 256 source datasets, 15 source application schemas, inconsistencies between the datasets, repeated object identifiers. 24/06/2010 INSPIRE Conference - Kraków 23
ISSUE 6: CROSS-THEME SYNCHRONISATION USE-CASE: Administrative units, Addresses, Transport Networks Administrative Units Addresses Transport Networks Cadastral Parcels Geographical names Buildings Statistical Units INSPIRE Themes Administrative Units Addresses Transport Networks 24/06/2010 INSPIRE Conference - Kraków 24
ISSUE 6: CROSS-THEME SYNCHRONISATION USE-CASE: Administrative units, Addresses, Transport Networks data need to refer objects from other themes which are not in the stream of current harmonization process, source data have some representations of the object phenomena being published under another theme, but no direct reference to valid object can be found in the dataset, the only way to obtain valid relationships to the features from themes can be time consuming SPATIAL ANALYSES executed on milions of objects. 24/06/2010 INSPIRE Conference - Kraków 25
ISSUE 7: NUMBER OF GML FILES USE-CASE: Administrative units, Addresses, Transport Networks How organize data in GML files? Number of data provided for INSPIRE exceeds the readable size of GML files. AU, TN, AD sample: probe data: THEME Matching features INSPIRE features Data size Administrative units 11964 1014 1 GB Addresses 3118754 1131524 1,7 GB Road Transport Networks 1015550 678406 8,9 GB Rail Transport Networks 131081 35341 3,4 G 24/06/2010 INSPIRE Conference - Kraków 26
ISSUE 7: NUMBER OF GML FILES USE-CASE: Administrative units, Addresses, Transport Networks What about the features without geometry? How many GML files can be expected for 32 milion of area objects? What are the limits of GML files usability What is the file size for practical data upload? Organization of data in GML files: by area, by feature types, by nighbourhood, all above combined. THEME INSPIRE features ORGANISATION GML files Administrative units 1014 by feature type 2 Addresses 1131524 by feature type and area 25 Road Transport Networks 678406 by area 183 Rail Transport Networks 35341 ay area 89 24/06/2010 INSPIRE Conference - Kraków 27
SUMMARY Many interoperability aspects is already resolved by ISNPIRE specifications at MS level harmonisation process concerns directly just few of them. There are formal (legal), operational and technical questions that must be answered prior to harmonisation. Data harmonisation cannot be done for single theme in separation due to cross-theme issues. Size of data forces rethinking of concepts for publishing data using GML files. After 2 years from completing Annex I themes development there are still open questions influencing final goal interoperability of datasets and services. 30/06/2011 INSPIRE Conference - Edinburgh 28
THANK YOU! Agnieszka Zwirowicz- Rutkowska GIS Specialist t: +48.(22).329.19.84 Institute of Geodesy and Cartography. f: +48.(22).329.19.50 ul. Modzelewskiego 27 elzbieta.bielecka@igik.edu.pl 02-679 Warszawa, Polska Intergraph Corporation Alina Kmiecik IT Analyst, Security, Government & Infrastructure INSPIRE TWG Expert on Statistical Units & Population distribution-demography t: +48.(42).253.48.20 Intergraph Polska Sp. z. o.o. m: +48.668.466.840 ul. Wólczańska 178 f: +48.(42).253.48.01 90-530 Łódź, Polska alina.kmiecik@intergraph.com Elżbieta Bielecka Head of GIS Dep. INSPIRE TWG Expert on Land cover t: +48.(22).329.19.84 Institute of Geodesy m: +48.603.950.061 and Cartography. f: +48.(22).329.19.50 ul. Modzelewskiego 27 elzbieta.bielecka@igik.edu.pl 02-679 Warszawa, Polska