ENGRG 59910 Introduction to GIS Michael Piasecki October 5, 2014 Lecture 05: GIS and Databases Basics Where are we now? Basic geographic concepts Introduction to GIS, coordinate system, projection, datum Data: Acquisition, Input and Management Data model: vector vs. raster Data source: map, attribute data (geocoding), GPS, remote sensing Data input: digitizing Data quality and meta data Data management: database Analysis Output: map design October 5, 2014 ENGRG 59910 Intro to GIS 2 1
Context of what we are learning Satellite Images Geo Reference Aerial Photographs Digitizing GPS (later) Non spatial data (Attribute Data) Join, Relate, Geocoding Maps Spatial Database October 5, 2014 ENGRG 59910 Intro to GIS 3 GIS Definition 2: Break Down Words a System a group of connected entities and activities an Information System a set of procedures, executed on raw data, to produce information for decision making a Geographic Information System an information system that uses geographically referenced data October 5, 2014 ENGRG 59910 Intro to GIS 4 2
Today s Outline Introduction to Database Management System (DBMS) Recognize four database types Relational Database Basics Basic understanding of database theory GeoDatabase Overview Demo: Common GIS database operation Join/Relate Spatial Join October 5, 2014 ENGRG 59910 Intro to GIS 5 Evolution of GIS Environments October 5, 2014 ENGRG 59910 Intro to GIS 6 3
Flat File vs. Database Flat Files Flat files are easier to understand Difficult to manage and manipulate Large file size Database Data is organized or structured using a database model Reduce data redundancy Data integrity is improved Can be queried many databases use the same language, SQL (Structured Query Language), for formulating queries. DATABASE = Data file(s) + data organization + processing ready October 5, 2014 ENGRG 59910 Intro to GIS 7 Database Management Systems Goal for any DBMS: efficient searching and linking of tabular data. GIS DBMS Goal: efficient manipulation (including search and linking) of spatial objects (points, lines, polygons, polylines), relationships between objects and tabular data (i.e., topology, attributes). Data Software Hardware Database DBMS Management System October 5, 2014 ENGRG 59910 Intro to GIS 8 4
Field vs. Column; Record vs. Row Forest Trail Feature Nantahala Bryson s Knob Vista Cherokee Slickrock Falls Ogrth Pisgah Chimney Rock Wlife Field: One item of information per object (column) Record: Information items about one object (row) Typically you operate on a field or column and select records or rows. A map object lights up when a row is selected October 5, 2014 ENGRG 59910 Intro to GIS 9 Ways of Organizing Information There are four basic database structures: Traditional Hierarchical Network Relational Recent development Object Oriented (O O) Relational database is most widely used. October 5, 2014 ENGRG 59910 Intro to GIS 10 5
Evolution of DBMS Technology File System Network DBMS Hierarchical DBMS Object-Oriented System (OODBMS) Relational DBMS Object-Relational ORDBMS October 5, 2014 ENGRG 59910 Intro to GIS 11 Example of Data Organization October 5, 2014 ENGRG 59910 Intro to GIS 12 6
Example of Data Organization From Chang s book P161 October 5, 2014 ENGRG 59910 Intro to GIS 13 The Basic of Relational Database October 5, 2014 ENGRG 59910 Intro to GIS 14 7
Relational Database Database includes multiple tables Tables are joined by relationships Relational model is grounded in mathematics: relational algebra defines the mathematical rules by which tables are manipulated. Any kind of attribute search (lateral, vertical) is possible. Examples of relational database programs Microsoft Access, Microsoft SQL Server, Oracle, DB2, FoxPro, MySQL, postgrsql October 5, 2014 ENGRG 59910 Intro to GIS 15 Why Use a Relational Database? Eliminate duplicate information Assist in querying data Simpler to manipulate data Reduce disk space Relational model has been most successful within GIS (and within the database world in general) October 5, 2014 ENGRG 59910 Intro to GIS 16 8
Relational Databases: Terminology Key Fields (2 keys) Relationships (3 relationships) Referential Integrity Database Normalization October 5, 2014 ENGRG 59910 Intro to GIS 17 Key fields Keys used to create uniqueness and link tables together Primary Key: Uniqueness, eliminate Redundancy Foreign Key: Linking tables, establishes relationships between tables October 5, 2014 ENGRG 59910 Intro to GIS 18 9
Primary Key Primary keys uniquely identify each record in a table. Primary keys become the foreign key in another table October 5, 2014 ENGRG 59910 Intro to GIS 19 Foreign Key October 5, 2014 ENGRG 59910 Intro to GIS 20 10
Data relationships: Cardinalities One to One (1:1) CCNY ID Students One to Many (1:M) ENG Students Courses ENGRG59910 Each CCNY student has a unique ID number (i.e., functional redundancy) Many students attend this class Many students are enrolled in many classes Many to Many (M:M) Students Classes Each student is taking many classes Each class has many students October 5, 2014 ENGRG 59910 Intro to GIS 21 One tone Relationships Only one matching record Uses primary key for both tables Use to limit access or isolate information October 5, 2014 ENGRG 59910 Intro to GIS 22 11
One to Many Relationships Most common type of relationship Related between primary and foreign keys October 5, 2014 ENGRG 59910 Intro to GIS 23 Many to Many Relationships Not directly supported between tables Use a junction table to relate One order, many products One product, many orders October 5, 2014 ENGRG 59910 Intro to GIS 24 12
Referential Integrity Maintain data accuracy Prevents orphan records Keeps relationships intact PK FK October 5, 2014 ENGRG 59910 Intro to GIS 25 Referential Integrity October 5, 2014 ENGRG 59910 Intro to GIS 26 13
Form Analysis: Normalization Reduce Duplication Improve Accuracy Data Maintenance October 5, 2014 ENGRG 59910 Intro to GIS 27 Form Analysis: Normal Forms Notice the redundancies in this table Put database into the First Normal Form October 5, 2014 ENGRG 59910 Intro to GIS 28 14
First Normal Form Basically speaking is a rearrangement of data Redundant columns are removed Functional dependencies are rampant E.g., Tship_ID Tship_name, Thall_add Tship_ID = 12 always is named Birch and is located at latitude 15W Allows you to identify groups! Fields in the parcel dataset Parcels Parcel_ID Alderman Tship_ID Tship_name Thall_add Own_ID Own_name Own_add October 5, 2014 ENGRG 59910 Intro to GIS 29 First Normal Form October 5, 2014 ENGRG 59910 Intro to GIS 30 15
Second Normal Form Parcel_ID Alderman Tship_ID Tship_name Functional dependency reduced by splitting data into like groups, possibly establishing a table to define the relationship between the first two Thall_add Own_ID Own_name Own_add October 5, 2014 ENGRG 59910 Intro to GIS 31 Third Normal Form October 5, 2014 ENGRG 59910 Intro to GIS 32 16
Third Normal Form Transitive functional dependencies are removed Parcel_ID Own_ID Parcel_ID Own_ID Alderman Tship_ID Own_name Own_add Tship_ID Tship_name Thall_add October 5, 2014 ENGRG 59910 Intro to GIS 33 Summary of Databases and GIS Most GIS packages still keep using hybrid solution: spatial data + attribute data (Arc+Info) The emergence of spatial database changes the way. Now many DBMS support spatial database: Oracle, DB2, MS SQL Server (commercial), and MySql, PostGreSql (open source) Geometric data Usually hierarchical Invisible to the user DBMS Geometric Spatial Data DBMS Attribute Attribute data Almost entirely relational Manipulated by the user October 5, 2014 ENGRG 59910 Intro to GIS 34 17
Attribute Data and Spatial Data Arc/Info: Hybrid name, the history of ArcGIS. Without attribute data, spatial data will be of limited use. October 5, 2014 ENGRG 59910 Intro to GIS 35 Geodatabase What is Geodatabase? Type of Geodatabase Geodatabase objects October 5, 2014 ENGRG 59910 Intro to GIS 36 18
What is geodatabase? A geodatabase (short for geographic database) is a physical store of geographic information (spatial, attribute, metadata, and relationships) inside a relational database management system (RDBMS). October 5, 2014 ENGRG 59910 Intro to GIS 37 Geodatabase Types Since Ver9.2 Personal Geodatabase for Microsoft Access File Geodatabase (new since V9.2) Workgroup Geodatabase (new since V9.2) SQL Server Express Enterprise Geodatabase: 5 supported DBMSs: DB2, Informix, Oracle, MS SQl Server, PostgreSQL Increasing size and functionality October 5, 2014 ENGRG 59910 Intro to GIS 38 19
What does a Geodatabase look like? October 5, 2014 ENGRG 59910 Intro to GIS 39 What does a Geodatabase look like? October 5, 2014 ENGRG 59910 Intro to GIS 40 20
Personal GeoDatabase (Access) October 5, 2014 ENGRG 59910 Intro to GIS 41 Geodatabase (file based) October 5, 2014 ENGRG 59910 Intro to GIS 42 21
Geodatabase objects basic objects: feature classes, feature datasets, nonspatial tables. complex objects building on the basic objects: topology, relationship classes, geometric networks October 5, 2014 ENGRG 59910 Intro to GIS 43 Feature classes A feature class is a geographic feature include points, lines, polygons, and annotation feature class. Feature classes may exist independently in a geodatabase as standalone feature classes or you can group them into feature datasets The SouthAmerica geodatabase contains four stand-alone feature classes: a point feature class of cities, a dimension feature class of distances between cities, a polygon feature class of countries, and an annotation feature class of country names Source: www.esri.com October 5, 2014 ENGRG 59910 Intro to GIS 44 22
Feature datasets A feature dataset is composed of feature classes that have been grouped together so they can participate in topological relationships with each other. All the feature classes in a feature dataset must share the same spatial reference (or coordinate system) Edits you make to one feature class may result in edits being made automatically to some or all of the other feature classes in the feature dataset In the CityWater geodatabase, three point feature classes and one line feature class were grouped into the PublicWater feature dataset to create a geometric network called WaterNet. Source: www.esri.com October 5, 2014 ENGRG 59910 Intro to GIS 45 Tables Feature class tables and nonspatial attribute tables. Both types of tables are created and managed in ArcCatalog and edited in ArcMap. Both display in the traditional row and column format. The difference is that feature class tables have one or more columns that store feature geometry. Nonspatial tables contain only attribute data (no feature geometry) and display in ArcCatalog with the table icon. They can exist in a geodatabase as stand alone tables, or they can be related to other tables or feature classes. The cfcc_desc table in the SantaBarbara geodatabase contains attribute data for the Roads feature class (stored inside the Roads feature dataset). Source: www.esri.com October 5, 2014 ENGRG 59910 Intro to GIS 46 23
Organizing Geographic Features Feature: A geographic representation of a spatial object Features: One row in a table represents one feature Feature Classes: one table or more than one table Feature Dataset: a set of feature classes October 5, 2014 ENGRG 59910 Intro to GIS 47 GeoDatabase Elements Geodatabase Feature data set Geometric network Feature class Relationship class Table Annotation class October 5, 2014 ENGRG 59910 Intro to GIS 48 24
Topology In a GIS, spatial relationships among feature classes in a feature dataset are defined by topology. You can choose whether to create topology for features. The primary spatial relationships that you can model using topology are adjacency, coincidence, and connectivity There are three types of topology available in the geodatabase: geodatabase topology (over 20 topology rules), map topology, and geometric network topology. Each type of topology is created from feature classes that are stored within a feature dataset. A feature class can participate in only one topology at a time October 5, 2014 ENGRG 59910 Intro to GIS 49 Example of Topology in a Geodatabase October 5, 2014 ENGRG 59910 Intro to GIS 50 25
Geometric Networks In the real world, examples of networks abound: streams joining together to form larger streams, pipes carrying water to homes and businesses throughout a city, and power lines carrying electricity. In a geodatabase, you can model each of these real world networks with a geometric network. Starting with simple point and line feature classes, you use ArcCatalog to create a geometric network that will enable you to answer questions such as: Which streams will be affected by a proposed dam? Which areas will be affected by a water main repair? What is the quickest route between two points in the network? Source: www.esri.com October 5, 2014 ENGRG 59910 Intro to GIS 51 Geometric Network example Feature Classes Valve Service Feed Lateral Main Geometric Network Source: ESRI European User Conference October 5, 2014 ENGRG 59910 Intro to GIS 52 26
Relationship Classes In a geodatabase, relationship classes provide a way to model real world relationships that exist between objects such as parcels and buildings or streams and water sample data. By using relationship classes, you can make your GIS database more accurately reflect the real world and facilitate data maintenance. The relationships stored in a relationship class can be between two feature classes (such as buildings and parcels, top) or between a feature class and a nonspatial attribute table (such as streams and water quality sampling data, bottom). Source: www.esri.com October 5, 2014 ENGRG 59910 Intro to GIS 53 ESRI data models Provided by ESRI http://support.esri.com/index.cfm?fa=downloads.datamodels.gateway Goal: provide a practical template for implementing GIS projects Start to think about your final project now Great start point for your GIS project October 5, 2014 ENGRG 59910 Intro to GIS 54 27
Industry specific Data models Address Health Agriculture Historic Preservation and Archaeology Archiving Homeland Security Atmospheric Hydro Basemap International Hydrographic Biodiversity Organization (IHO) S 57 for ENC Census Administrative Boundaries Land Parcels Defense Intel Local Government Energy Utilities Energy Utilities MultiSpeak TM Marine Environmental Regulated Facilities National Cadastre Forestry Petroleum Geology Pipeline GIS for the nation Raster Groundwater Telecommunications Transportation Water Utilities From: http://support.esri.com/index.cfm?fa=downloads.datamodels.gateway October 5, 2014 ENGRG 59910 Intro to GIS 55 Data model: national GIS You can download it from ESRI website directly October 5, 2014 ENGRG 59910 Intro to GIS 56 28
Queries Definition Query is the action or result of selecting a subset of records based on specific attribute values General Categories: Attribute (Tabular) Spatial Two main methods: Boolean operators (AND, OR, NOT) SQL operators (< > + = )\ Structured querying language (SQL) The mathematical basis of relational databases led to a standard languages for querying data (SQL) that uses simple mathematical operators Relational databases allow the user to nest operations for complex queries October 5, 2014 ENGRG 59910 Intro to GIS 57 Database Use: Structured Query Language (SQL) Set Operators = Equal < > Not Equal <Less Than >Greater Than <= Less Than or Equal >= Greater Than or Equal Relational Operators Union Intersection Difference Product Aggregate Functions: Summarize Sum of values for all rows for a given column. Average of given column Column Maximum Column Minimum Number of Rows (Count) that Satisfy a Condition October 5, 2014 ENGRG 59910 Intro to GIS 58 29
A Nested SQL Statement Two Statements: Females / Pop1990 < 0.55, then Pop1990 > 100000 and Pop1990 < 200000 Or, one statement: (Females/Pop1990 < 0.55) and ((Pop1990 > 100000) and (Pop1990 < 200000)) Compute all instances where the % of females in the 1990 population is less than 55% Then identify all population centers for the 1990 census where this true if these are larger than 100,000 and less than 200,000 October 5, 2014 ENGRG 59910 Intro to GIS 59 Spatial Queries Point Queries what is at a particular location? Range Queries what is in a particular area? Nearest Neighbor Queries where is the nearest object to a particular location? Spatial Join Queries where are the areas that have water supply and power supply? Spatial Aggregate Queries where is the most populated region? October 5, 2014 ENGRG 59910 Intro to GIS 60 30
Common Attribute Operations Queries selection operations that produce data subsets Join and Relate bringing data together (one table with non spatial attribute, one table with features) October 5, 2014 ENGRG 59910 Intro to GIS 61 Relations General Categories: Tabular based on some information within the attribute tables, e.g., a common field Spatial based on location: nearest, within or aggregate Geodatabase Relationship class Strategies Join Relate County Person Age Polygon_id = 157 Gpsid = 29 LC = Agriculture October 5, 2014 ENGRG 59910 Intro to GIS 62 31
Join vs. Relate Join Appends fields from second table with data for each record where a key field match is found (empty, otherwise) For 1:1 or M:1 only In 1:M or M:M, it stops with first hit (can t add rows/records for additional relationships) Relate Allows automatic access to a related table s records; keep tables physically separate For 1:M or M:M Doesn t add records to layer s table, so not limited by initial table s size Forest- ID ForestName 1 Nantahala 2 Cherokee Forest-ID Trail_Name Features Trailhead 1 Bryson's Knob Vista X1, Y1 2 Slickrock Falls Ogrth X2, Y2 1 North Fork Wfall X3, Y3 2 Cade's Cave Wlife X4, Y4 1 Appalachian Cmp X5, Y5 October 5, 2014 ENGRG 59910 Intro to GIS 63 Spatial Join Related two attribute tables via a shared location in space, rather than common field Concept: A type of join operation in which fields from one layer s attribute table are appended to another layer s attribute table based on the relative locations of the features in the two layers. Examples Finding the nearest feature Finding what's inside a polygon Finding what intersects a feature Where is there an incompatibility between zoning and current land use? How many Toxic Release Inventory sites are there in each county and what are the total releases per county? October 5, 2014 ENGRG 59910 Intro to GIS 64 32
Spatial Join URL: http://webhelp.esri.com/arcgisdesktop/9.2/index.cfm?topicname=learn%20 more%20about%20spatial%20relationships October 5, 2014 ENGRG 59910 Intro to GIS 65 Spatial Join: An example Input layers: Population in US cities; US states; Object: the city population in each state; how many cities in each state; Solutions: Count/summarize manually? Simple Query in DBMS (very easy but limited by data organization in many cases): select state, sum(pop) as citypop, count(city) as totalcity from cities group by state Spatial join October 5, 2014 ENGRG 59910 Intro to GIS 66 33
Spatial Join Example October 5, 2014 ENGRG 59910 Intro to GIS 67 What did we learn today? Introduction to Database Management System (DBMS) 4 types DBMS Relational Database Basics 2 keys 3 relations Data Normalization Form analysis Spatial Database Overview and Operation Join vs relate Spatial join October 5, 2014 ENGRG 59910 Intro to GIS 68 34