INTRODUCTION TO PROC SQL JEFF SIMPSON SYSTEMS ENGINEER
THE SQL PROCEDURE The SQL procedure: enables the use of SQL in SAS is part of Base SAS software follows American National Standards Institute (ANSI) standards includes enhancements for compatibility with SAS software
THE SQL PROCEDURE The SQL procedure is a tool for querying data a tool for data manipulation and management an augmentation to the DATA step. The SQL procedure is not a DATA step replacement a custom reporting tool.
STRUCTURED QUERY LANGUAGE Input SAS Data Set Output Report DBMS Table PROC SQL PROC SQL SAS Data Set SAS Data View SAS Data View DBMS Table
SELECT STATEMENT SYNTAX General form of the SELECT statement with selected clauses: SELECT column-1<,...column-n> FROM table-1 view-1<,...table-n view-n> <WHERE expression> <GROUP BY column-1<, column-n>> <HAVING expression> <ORDER BY column-1<desc><, column-n>>;
SELECT STATEMENT SYNTAX TO HELP YOU REMEMBER So Few Workers Go Home On time;
SELECT STATEMENT SYNTAX Order of execution for SQL clauses: 5 SELECT column-1<,...column-n> 1 FROM table-1 view-1<,...table-n view-n> 2 <WHERE expression> 3 <GROUP BY column-1<, column-n>> 4 <HAVING expression> 6 <ORDER BY column-1<desc><, column-n>>;
ORDER OF EXECUTION The FROM clause is processed first. This clause identifies the data sets from which SQL is reading The WHERE clause is a pre-processor. As with the DATA step, the WHERE clause searches the original data set(s) for values that meet the WHERE condition. These results are stored in an intermediate table. The GROUP BY is the next clause that processes. The GROUP BY clause creates groupings within the data.
ORDER OF EXECUTION The HAVING clause is processed next. This clause subsets based upon groups that were created by the GROUP BY clause. It also builds an intermediate table. The SELECT clause selects the columns for the results set. The ORDER BY clause orders (or sorts) the rows.
DEMONSTRATION Let s use SQL to query and manipulate a data set.
COMBINING DATA FROM MULTIPLE TABLES SQL uses joins to combine tables horizontally. Table A Table B
TYPES OF JOINS PROC SQL supports two types of joins: inner joins outer joins
INNER JOINS Inner joins return only matching rows enable a maximum of 256 tables to be joined at the same time.
TYPES OF JOINS Outer joins return all matching rows, plus nonmatching rows from one or both tables can be performed on only two tables or views at a time. Left Full Right
CARTESIAN PRODUCT To understand how SQL processes a join, it is important to understand the concept of the Cartesian product. A query that lists multiple tables in the FROM clause without a WHERE clause produces all possible combinations of rows from all tables. This result is called the Cartesian product. select * from one, two;
CARTESIAN PRODUCT Table One X A 1 a 4 d 2 b Table Two X B 2 x 3 y 5 v
CARTESIAN PRODUCT Table One X A 1 a 4 d 2 b Result Set X A X B 1 a 2 x 1 a 3 y 1 a 5 v 4 d 2 x 4 d 3 y 4 d 5 v 2 b 2 x 2 b 3 y 2 b 5 v Table Two X B 2 x 3 y 5 v
DEMONSTRATION Write a query that performs an inner join.
OUTER JOINS Full Join Retrieve the matching rows as well as the non-matches from the left table and the non-matches from the right table. Full
FULL JOIN Table One X A 1 a 4 d 2 b Table Two X B 2 x 3 y 5 v select * from one full join two on one.x = two.x; X A X B 1 a. 2 b 2 x. 3 y 4 d.. 5 v
SQL FULL JOIN SQL joins do not automatically overlay same-named columns. Table One X 1 4 2 A a d b Table Two X 2 3 5 B x y v proc sql; select one.x, a, b from one full join two on one.x=two.x ; quit; Output X A B 1 a 2 b x y 4 d v
THE COALESCE FUNCTION The COALESCE function returns the value of the first non-missing argument. General form of the COALESCE function: COALESCE(argument-1,argument-2<,...argument-n)
SQL FULL JOIN You can use the COALESCE function to overlay columns. Table One X 1 4 2 A a d b Table Two X 2 3 5 B x y v proc sql; select coalesce(one.x,two.x) as x,a,b from one full join two on one.x=two.x; quit; Output X A B 1 a 2 b x 3 y 4 d 5 v
SQL JOIN VERSUS DATA STEP MERGE The DATA step code producing similar results: Table One X 1 4 2 A a d b Table Two X 2 3 5 B x y v Output proc sort data=one; by x; run; data three; merge one two; by x; run;
SQL JOIN VERSUS DATA STEP MERGE Key Points SQL Join DATA Step Merge Explicit sorting of data before join/merge Same-named columns in join/merge expressions Equality in join or merge expressions Not required Not required Not required Required Required Required
PROC SQL AND MACRO VARIABLES PROC SQL creates or updates macro variables using an INTO clause. The INTO clause has varied syntaxes, and each produces a different result.
PROC SQL AND MACRO VARIABLES The following syntax places values from the first row returned by an SQL query into macro variable(s). Data from additional rows returned by the query is ignored. General form of the SELECT statement with an INTO clause: SELECT column-1<, column-n> INTO :macvar_1<,... :macvar_n> FROM table view The value from the first column in the SELECT list is placed in the first macro variable listed in the INTO clause, and so on.
SQL CREATING EXECUTION TIME MACRO VARIABLES proc sql noprint; select country, barrels into :country1, :barrels1 from sql.oilrsrvs;
SUPPORT.SAS.COM RESOURCES SAS Base SAS Documentation http://support.sas.com/documentation/onlinedoc/base/index.html Papers & SAS Notes http://support.sas.com/resources/papers/sgf09/336-2009.pdf http://support.sas.com/kb/20/783.html SAS Training https://support.sas.com/edu/schedules.html?id=336&ctry=us
SUPPORT.SAS.COM RESOURCES RSS & Blogs http://support.sas.com/community/rss/index.html http://blogs.sas.com/sastraining/index.php?/plugin/tag/proc+sql Discussion Forums http://support.sas.com/forums/index.jspa
MATERIALS Portions of the materials for this presentation were adapted from the following SAS training class: SAS SQL I: Essentials For more information on the topics covered in this class, please visit: https://support.sas.com/edu/schedules.html?id=336&ctry=us
THANK YOU FOR ATTENDING! sas.com