DATA VALIDATION AGAINST SCHEMA AND SOURCE DATA didier.bouteloup@ign.fr; dominique.laurent@ign.fr 3 June 2016 ign.fr
Context IGN has performed data validation twice On test INSPIRE data (2013-2014) On ELF data (on-going)
VALIDATION OF TEST DATA
Context IGN decided to transform test data to INSPIRE: for 3 «simple» themes : AU, GN, AD on 3 X-border «départements» Average area: 6 000 km 2 Data supplied in GML files as predefined packages Objective: sample data for users testing Need for validation Need for documentation (in French)
Data transformation methodology Production data base (PostGis) Archive DataBase External products INSPIRE data Production Unit Delivery Unit
Data transformation methodology Define transformation rules Excel matching tables During meetings Production unit : source data knowledge Standardisation unit : INSPIRE knowledge Delivery unit: computer, transformation knowledge Two-steps process Transformation to pseudo-inspire data base GML computed from WFS (deegree)
Data validation process Responsible : unit for external products The team in charge of validating and documenting IGN external products
Data validation process Input : Matching tables + minutes of matching tables meetings INSPIRE data Source data on same area Objectives Validate INSPIRE data against schema Ensure that data is conform to INSPIRE Validate INSPIRE data against source data Ensure that transformation process has not brought errors On whole datasets
Validation against schema Input: INSPIRE schema: XSD file INSPIRE data Use of XML Spy Difficulties Not easy to know what is exactly checked Example: code list not included in the XSD Have been hand-made checked in phase 2 (validation against data)
Validation against schema Difficulties: Iterative process : The tool stops at first error Need to correct the error Run the tool again Most errors in the GML header Error messages not clear Globally, few errors Mainly due to direction errors: expected resources not found Data knowledge not enough Need to understand the INSPIRE infrastructure, registries,
Validation against data Principle: Source data Transformed data Comparison / matching rules Error reports All tests are home-made designed Use of FME Import source and INSPIRE data Use tool kit to perform the comparison and error reports => no coding
Validation against data Source data Local types for «populated areas» Error reports INSPIRE data Checking transformation to «populatedarea» in theme GN
Validation against data INSPIRE data Use of the feature geometry Error reports Source data Identifier validation for theme AD
Validation against data Test examples Check CRS transformation Code list : hand-made creation + checking To check the conformity to INSPIRE Compare geometry of features having same identifier Check corespondance on attributes Make counts.
Validation against data Validation results Association implementation : syntax (use of #) The feature target of an association may be missing At département boundary No many errors
Validation against data Validation difficulties Attributes [0..*] or [1..*]: Integrated in FME as lists Need to learn how to handle these lists GML file for theme AD could not be edited : too big for Word, NotePad, Make sample to understand content and design control test FME automatic controls may run the control on whole file Lot of work to design all the tests (even on simple themes AU, AD and GN)
Validation against data Potential improvements Validation of code lists: have a common solution in Europe Improve the documentation on transformation rules Better matching tables More comment Avoid merged cells Even better, replace Excel matching tables by a transformation data base Make validation work easier, more accessible Provide more synthetic documentation Final minutes with summary of main decisions Document final results (what is filled and has to be validated)
Validation against data INSPIRE feature type may be difficult to find
Validation against data Unpopulated concepts are highlighted in red.
VALIDATION OF ELF DATA
Context In ELF project, we will Check if IGN services conform to WFS Check that GML data conform to XSD Check validity of transformation (INSPIRE data against source data) Some issues to make ELF tools work But on-going work about the last 2 steps
Main changes Data supplied by WFS (instead of pre-packages) But GIS unable to fully handle INSPIRE data Need to extract GML files and to run controls on these GML files Make controls on test areas From validation on whole data set to validation on sample data Need to decide on these test areas Have data representative enough Size of test area may vary according feature types Airport: whole France Addresses : a municipality may be enough Will be done based on ISO 19157 principles
Main changes Extraction tool to be developed To get sample data : need to organize queries Issue with maximum number of features Be sure that all features in an area have been extracted (to enable comparison with source data) Likely using the FME tool kit FME might be used in addition to XML Spy to check conformity to schema Technological evolutions Common solution to validate code lists? Tools evolution
Remaining tasks Make ELF tools work Bug in OGC tool Syntax issue with ELF tools Use of caps (ELF): GETCAPABILITIES Use of standardised (IGN): GetCapabilities Design control process for other themes : TN, HY, BU May be more complex for TN and HY Improve documentation of our transformation rules