Set Operators in SQL
Set Operators in SQL Joins in SQL are generally concerned with creating new row/observations with columns from multiple tables, where some of the columns are expected to have matching values (and, of course, various outer joins include mismatches). Set operators, named as such since they include items like UNION and INTERSECTION, are concerned with assembling a group of rows/observations from a set of tables for a chosen column set. In most cases, the column set is presumed to be common to all tables, but that is not a requirement. The choice of operation determines the rows collected. To illustrate these operations, existing data sets will be sampled and reassembled under various conditions.
Starting Data From the FLIGHTDELAYS data, three data sets will be created: data notdelayed delayed delayedten set mysas.flightdelays if delay le 0 then output notdelayed if delay gt 0 then output delayed if delay ge 10 then output delayedten run
Starting Data From the FLIGHTDELAYS data, three data sets will be created: data notdelayed delayed delayedten set mysas.flightdelays if delay le 0 then output notdelayed if delay gt 0 then output delayed if delay ge 10 then output delayedten run
Starting Data From the FLIGHTDELAYS data, three data sets will be created: data notdelayed delayed delayedten set mysas.flightdelays if delay le 0 then output notdelayed if delay gt 0 then output delayed if delay ge 10 then output delayedten run Clearly NOTDELAYED and DELAYED are have disjoint sets of rows, as do NOTDELAYED and DELAYEDTEN
Starting Data From the FLIGHTDELAYS data, three data sets will be created: data notdelayed delayed delayedten set mysas.flightdelays if delay le 0 then output notdelayed if delay gt 0 then output delayed if delay ge 10 then output delayedten run DELAYED and DELAYEDTEN are not disjoint across their rows.
Union First, the SQL UNION operator: create table union1 as select * from notdelayed union select * from delayed quit
Union First, the SQL UNION operator: create table union1 as select * from notdelayed union select * from delayed quit The original 624 observations are reconstructed.
Union UNION does what you would expect, collects all rows from all tables, but ignores duplication: create table union2 as select * from delayedten union select * from delayed quit
Union UNION does what you would expect, collects all rows from all tables, but ignores duplication: create table union2 as select * from delayedten union select * from delayed quit This returns a table equivalent to the delayed table.
Intersection INTERSECT does with the rows what you would typically expect from its definition, collects all rows common to all tables: create table intersect1 as select * from delayedten intersect select * from delayed quit
Intersection INTERSECT does with the rows what you would typically expect from its definition, collects all rows common to all tables: create table intersect1 as select * from delayedten intersect select * from delayed quit As expected, this is a duplicate of the DELAYEDTEN data table
Intersection INTERSECT does with the rows what you would typically expect from its definition, collects all rows common to all tables: create table intersect2 as select * from notdelayed intersect select * from delayed quit
Intersection INTERSECT does with the rows what you would typically expect from its definition, collects all rows common to all tables: create table intersect2 as select * from notdelayed intersect select * from delayed quit No rows, as would be expected.
A Random Partition From the FLIGHTDELAYS data, two random samples will be created: data delay1 delay2 set mysas.flightdelays if ranuni(1) le.25 then output delay1 if ranuni(2) le.25 then output delay2 run
A Random Partition From the FLIGHTDELAYS data, two random samples will be created: data delay1 delay2 set mysas.flightdelays if ranuni(1) le.25 then output delay1 if ranuni(2) le.25 then output delay2 run What is the probability an observation is selected for both data sets?
Intersection We would expect an intersection here: create table intersect3 as select * from delay1 intersect select * from delay2 quit
Intersection We would expect an intersection here: create table intersect3 as select * from delay1 intersect select * from delay2 quit
Exception EXCEPT is another set operator that is available, it is equivalent to an intersection with a complement: create table except1 as select * from delay1 except select * from delay2 quit
Exception EXCEPT is another set operator that is available, it is equivalent to an intersection with a complement: create table except1 as select * from delay1 except select * from delay2 quit What are these 102 rows? They are the rows in DELAY1 that are not in DELAY2.
Exception EXCEPT is another set operator that is available, it is equivalent to an intersection with a complement: create table except1 as select * from delay1 except select * from delay2 quit There is an additional subtlety here, except removes duplicates from all tables before completing its operation. Except all will preserve duplicates.
In Practice, More Care Must Be Taken Suppose the data tables in question were a bit less clean: data delaya format delay 5. date mmddyy10. destination origin $3. set mysas.flightdelays if ranuni(1) le.25 keep delay date destination origin run data delayb format date mmddyy10. delay 5. origin destination $3. set mysas.flightdelays if ranuni(2) le.25 keep delay date destination origin run
Union Applying the UNION operator to these: create table try1 as select * from delaya union select * from delayb quit
Union Applying the UNION operator to these: create table try1 as select * from delaya union select * from delayb quit There appears to be a problem here.
Column Alignment For each of these operators, the alignment of the columns is positional, i.e.: column order matters. To overcome this: Be precise in the select clause/statement especially if column names do not match. Use the CORRESPONDING option. As would be expected, attempting to match a numeric and character column will produce an error.
In Practice A couple of possible solutions to the previous: create table try2 as select date, origin, destination, delay from delaya union select date, origin, destination, delay from delayb quit Or create table try3 as select * from delaya union corresponding select * from delayb quit
In Practice, More Care Must Be Taken Suppose the data tables in question were a bit less clean: data delayi format delay 5. date mmddyy10. destination origin $3. set mysas.flightdelays if ranuni(1) le.25 keep delay date destination origin run data delayii format date mmddyy10. delay 5. origin $3. set mysas.flightdelays if ranuni(2) le.25 keep delay date origin run
CORRESPONDING Corresponding will not preserve mismatched columns with union, but it will with outer union: create table try4 as select * from delayi union corresponding select * from delayii quit Outer union create table try5 as select * from delayi outer union corresponding select * from delayii quit
CORRESPONDING Corresponding will not preserve mismatched columns with union, but it will with outer union: create table try4 as select * from delayi union corresponding select * from delayii quit Outer union create table try5 as select * from delayi outer union corresponding select * from delayii quit