Make Your GIS Work for You, First Step: Data Quality Presented by: Scott Sumners, GIS Manager, City of Brentwood James McCord, GIS Analyst, City of Brentwood Gerardo Boquin, GISP /CH2M
Brentwood GIS CAD to GIS Aerial Photography Topographic base maps Schema Changes Added new layers Added subtypes and domains New Imagery 1995 2006 2009 2012 20152016 GIS Dept. (Staff and Interns) Water Billing Asset Mang / Vueworks
Topics Why is Data Quality important? Brentwood GIS Case Study Introduction to Data Reviewer Initial Findings Data Quality Assurance Plan (Methodology) Running Data Reviewer (Lessons Learned) Geometric Networks (Lessons Learned) Fixing the data The results The unperceived benefits
Why is Data Quality Important? Data needs to be: - Accessible - Reliable - Spatially accurate - Descriptive As a result, water, wastewater, and stormwater utilities are now heavily focusing on quality assurance (QA) and quality control (QC) to ensure that their GIS data truly meets their needs. ESRI White Paper: GIS Data Quality Best Practices for Water, Wastewater and Stormwater Utilities July 2011
Why is Data Quality Important? Summary of tools described in paper: - Geodatabases - A good data model - Versioned Environment (multiple editors) - Geometric Networks - Model Builder/ Python - ArcGIS Data Reviewer - Production Mapping Workflow Manager ESRI White Paper: GIS Data Quality Best Practices for Water, Wastewater and Stormwater Utilities July 2011
Why is Data Quality Important? Positional Accuracy Topological Logic Geometric Data Considerations Projections and Coordinate Systems Attribute and Data Structure https://www.linkedin.com/pulse/gis-gigo-garbage-out-30-checks-data-errors-nathan-heazlewood
Why is Data Quality Important? It is very expensive to have dirty-bad or incomplete data - Data is not reliable - Tools don t work correctly - Creating custom tools is very time consuming - Trouble shooting data errors
Brentwood GIS Case Study GIS Workflow - Multiple editors - Typical QA/QC consisted on visual checks - No issues = no problems Until - CH2M ran a courtesy check on sewer GIS data using ArcGIS Data Reviewer - Overall GIS health was good. - Data Reviewer pointed out areas where business needs were not met 100%.
Shocking introduction to Data Reviewer in 2015 Consultant calls to say: I ran Data Reviewer on your data and discovered that YOUR data has lots of ISSUES
Initial Findings Issues Found (52,218) Sewer Dataset only: - Non compliant domains - Duplicate geometries - Multipart geometries - Duplicate ID s
Initial Findings Geometric Network
Initial Findings 1 2 3 1- Missing assets 2- Pipes had wrong flow direction 3- Pipes connected to wrong asset Disconnected assets Issue Missing Assets Solution Add Taps and Tee s Flow Connectivity Trim/Extend Pipe Snap features
Data Quality Assurance (Methodology) Established a Data Quality Plan - Consisted on 4 pre-configured batch files addressing: 1. Invalid geometries 2. Duplicate ID s 3. Duplicate Geometries 4. Domain and Subtype Validation - Geometric Network 1. Checking all features 2. Customized for critical features Assets 1Flow Monitors 2Grease_Interceptors 3GrinderPumps 4LiftStation 5Manholes 6SewerAirReleaseValve 7SewerControlValve 8SewerFitting 9SewerGravityPipe 10SewerLateralPipe 11SewerPressurizedPipe 12SewerService 13SewerServiceValve
Data Quality Assurance (Methodology) CH2M provided hands on Data Reviewer training City of Brentwood self performed data clean up City of Brentwood submitted copy of the Sewer and Water feature datasets every 3-4 weeks for review CH2M Reviewed all datasets consistently and compiled information for presentation
Running Data Reviewer (Lessons Learned) 1. Always perform a check for invalid geometries prior to perform other checks. 2. Create batch files containing custom checks 3. Create multiple batch files and group them logically 4. Create separate database to store QAQC results
Running Data Reviewer (Lessons Learned) More on Invalid Geometries: - In general, invalid geometries are the worst error to fix. - It can cause tools to crash or produce incomplete results. Does this sound familiar? https://blogs.esri.com/esri/arcgis/2012/03/28/invalid-geometry-check-explained/
Geometric Networks (Lessons Learned) Does not support features with M or Z values - This might require a data model change 10.2 geometric networks didn t accept versioned features - Created database replicas to perform geometric network fixes 10.3 accepts versioned features http://resources.arcgis.com/en/help/main/10.2/index.html#/in_arccatalog/002r00000009000000/ http://desktop.arcgis.com/en/arcmap/10.3/manage-data/geometric-networks/geometric-networks-and-versioned-geodatabases.htm
Fixing the data Systematically mining the data by ObjectID Organizing the data
Fixing the data
Number of Features Number of Errors Results GIS Data Vrs. Domain/ Subtype Errors 31000 30800 71718 30887 30873 80000 70000 30600 61216 60000 30400 Number Of Features Validation 50000 30200 30000 29965 30120 40000 30000 29800 20000 29600 10000 29400 21 20 0 Base R1 R2 R3
Results Schema changes - Removed unnecessary fields - Added new domain values to fill in the Nulls Used Data Reviewer - Identify non complaint values and fixed them - Sorted fields Ascending (typed values were on top) Tackle one attribute problem at a time - Worked on resolving Invalid values first - Address Null values second
Number of Features Number of Errors Results GIS Data Vrs. Duplicate ID Errors 31000 1489 30887 30873 1600 30800 1400 30600 1200 1010 1083 30400 927 1000 30200 800 30000 29965 30120 Number Of Features 600 29800 DuplicateIDs 400 29600 200 29400 Base R1 R2 R3 0 Created a Python Script to assign new ID s automatically
Results Unique ID Check helped identify Backflow Control Valves that didn t have an ID assigned to it in the backflow control database Both Backflow control Engineer and GIS Analyst were aware of the issue. GIS Analyst followed up with Engineer to assign ID s. Brentwood gets audited once a year by the state and not having unique ID s that correlated to backflow control tables would have cause a reason to fail the audit.
Number of Features Number of Errors Results 31000 30800 GIS Data Vrs. Geometric Errors 463 458 30887 30873 500 450 30600 30400 Number Of Features GeometricErrors 400 350 300 30200 30120 250 30000 29800 29965 142 200 150 100 29600 29400 Base R1 R2 R3 23 50 0
The unperceived benefits 1. Integrated systems work correctly a) Data that is part of a business process needs to be properly maintained 2. Robust datasets a) Source of truth b) Better reporting and access to more spatial analysis capabilities 3. QA/QC Industry Best Practices 4. Faster tool development a) Standardized data = less time spent developing workflow to help trouble shoot analysis work flows
Conclusions Data is dynamic and maintaining Data Quality is a continuous process Its expensive to have dirty / incomplete data ArcGIS Data Reviewer is easily customizable to: - Check Data (Batch or single check) - Data Health Reports A combination of visual and computerized checks is the best formula to maintain Data Quality
Thank you for your time. Scott Sumners: scott.sumners@brentwoodtn.gov James McCord: james.mccord@brentwoodtn.gov Gerardo Boquin: gerardo.boquin@ch2m.com