Get Going with PROC SQL Richard Severino, Convergence CT, Honolulu, HI

Size: px
Start display at page:

Download "Get Going with PROC SQL Richard Severino, Convergence CT, Honolulu, HI"

Transcription

1 Get Going with PROC SQL Richard Severino, Convergence CT, Honolulu, HI ABSTRACT PROC SQL is the SAS System s implementation of Structured Query Language (SQL). PROC SQL can be used to retrieve or combine/merge data from tables or views as well as generate reports and summary statistics. With PROC SQL you can modify and update tables, create new variables on the fly, access data from a database and join SAS data sets to each other or to tables from a database. This tutorial will introduce the basics of PROC SQL as well as some advanced features to help you start using PROC SQL effectively. INTRODUCTION Structured Query Language, or SQL, is a standardized language that retrieves and updates data from data tables. PROC SQL is the SAS System s implementation of SQL and allows the user to access data, generate reports and summary statistics as well as perform data management tasks. The purpose of this tutorial is to introduce enough of the SQL procedure so that the beginning user will gain enough knowledge to be able to use it effectively and to be able to learn to solve more complex problems using PROC SQL. SQL TERMINOLOGY While some of the SQL terminology used in PROC SQL is not the same as that used with Base SAS, it is often interchangeable. The reason for the distinct terminology associated with PROC SQL is due to the terminology associated with and used in Relational Databases around which SQL was developed. The table below shows some Base SAS terms and their analogous SQL terms. SAS SQL Data Set A data file Table View A data file that can be read (viewed) but can not be modified View Observation Records in the data file Row or Record Variable Examples: age, id, date, salary, flight, destination Column or Field Merge getting information from more than one data file or table and putting it together Join BASIC SYNTAX The basic syntax for PROC SQL is as follows: PROC SQL ; SELECT <field/column names> FROM <table names> ; QUIT; PROC SQL begins with the Proc SQL; statement. For each PROC SQL statement, you may specify many additional substatements or clauses. Unlike base SAS, the sub-statements or clauses are not delimited by a semi-colon (;). In PROC SQL, one semi-colon is used at the end of the each statement, which may include several sub-statements or clauses. PROC SQL does not require a run; statement anywhere, but you should end it with a statement. In general, an SQL query consists of selecting fields or columns from a data source where the data source may be a single table or it may consist of several tables that are to be joined. The complete PROC SQL syntax is available in the online documentation on the web at support.sas.com. WORKING WITH DATA We will be using a dataset named MARCH which consists of airline flight information for the month of March. SELECT SOME DATA Let s take a look at the data by sending it to the output window. The following code will send all the data to the output window: SELECT * from sql.march ; 1

2 The asterisk * on the SELECT statement is a wildcard and means all fields or columns will be selected from the data table, and therefore the above code selects all the fields in table MARCH and then displays all the records or rows in the table. If we don t know how many records there are in MARCH, we should limit the number of records that are sent to the output window just in case the table has hundreds or thousands of records. Using the INOBS= options will limit the number of records read from the data source, while the OUTOBS= option limits the number of records output by PROC SQL. The following code: proc sql INOBS=5; select * from sql.march ; will result in the following output: Selecting All Fields from MARCH Table and Using INOBS=5 flight date depart orig dest miles boarded capacity MAR94 7:10 LGA LAX MAR94 10:43 LGA ORD MAR94 9:31 LGA LON MAR94 12:19 LGA FRA MAR94 15:35 LGA YYZ If we don t want to print out all the fields in the table, then we must specify the names of the fields we want in the SELECT statement. The following code selects three fields, flight, date and dest, and uses OUTOBS to limit the output. proc sql OUTOBS=5; select flight, date, dest from sql.march ; The following warning will be printed in the Log: WARNING: Statement terminated early due to OUTOBS=5 option. And the following will print in the Output window: Selecting All Fields from MARCH Table and Using OUTOBS=5 flight date dest MAR94 LAX MAR94 ORD MAR94 LON MAR94 FRA MAR94 YYZ INOBS and OUTOBS can be used together keeping in mind that if INOBS is less than OUTOBS the result will be that the procedure will only output as many records as are specified in the INOBS option. You will find it valuable to check the log when working with PRC SQL. SELECT DISTINCT The field dest in the MARCH table holds the 3 character airport code for the destination of the flight. If we want to get a list of all the destinations in the table, we can use SELECT DISTINCT to obtain the unique values of dest. The following code will get such a list: 2

3 select DISTINCT dest from sql.march; Output: List of Destination Airports dest ---- FRA LAX LON ORD PAR WAS YYZ Notice that each airport code is listed only once. Using SELECT DISTINCT is a useful tool to find out what unique values are stored in a field. If you specify more than one field or column in the SELECT DISTNCT statement, the query will return a list of all the combinations of values in the fields specified. The following query: select DISTINCT dest, capacity from sql.march; results in the following output: List of Destination Airports and Flight Capacity dest capacity FRA 250 LAX 210 LON 250 ORD 210 PAR 250 WAS 180 YYZ 178 Notice that capacity of 250 is listed several times, but each time with a different destination. WHERE: SUBSETTING The WHERE clause is used to select rows or records whose field values meet a particular condition or set of conditions. To list all the flights where the destination was LAX we add a WHERE clause as follows: title Flights to LAX; select * from sql.march WHERE dest = LAX ; A partial output listing for the immediately preceding code is as follows: Flights to LAX flight date depart orig dest miles boarded capacity MAR94 7:10 LGA LAX MAR94 7:10 LGA LAX MAR94 7:10 LGA LAX

4 NAMING, LABELING AND FORMATING COLUMNS The variables in the MARCH dataset are not labeled and some of their names are not necessarily indicative of what the variables hold. In the SELECT statement we can rename columns and we can add or change labels as well as formats. In the example that follows we will rename the field dest to destination and flight to flight_num, we will provide a label for date and miles and we will change the format of miles. The PROC SQL code and a partial listing of the output are shown below. Notice that the column headings in the output show the new names, labels and formats. select flight as flight_num, date as depart_dt dest as destination, miles from sql.march; label="departure Date", label="distance to Destination in Miles" format=comma6.0 Output: Rename, Label and Change Formats of Columns Distance to Departure Destination flight_num Date destination in Miles MAR94 LAX 2, MAR94 ORD MAR94 LON 3, MAR94 FRA 3, MAR94 YYZ MAR94 PAR 3, MAR94 WAS 229 CREATING NEW COLUMNS In the SELECT statement, you can create a new column which is calculated from one or more existing columns or fields in the data source you are querying. The MARCH dataset has the number of passengers that actually boarded each flight as well as the capacity of the flight. Suppose we want to display a column that shows the number of empty seats on each flight. And that we would like to convert the distance from miles to kilometers. The following PROC SQL accomplishes this task. select flight label="flight Number", date, dest, ROUND(miles* ) as kilometers label="flight Distance in Km" format=comma6.0, boarded, capacity - boarded as empty label="number of Empty Seats" format=4.0 from sql.march ; A partial listing of the output is follows. Creating New Columns Calculated from Existing Columns Number Flight of Flight Distance Empty Number date dest in Km boarded Seats MAR94 LAX 3, MAR94 ORD 1, MAR94 LON 5, MAR94 FRA 6,

5 Notice that the headers for date and dest which we created earlier are no longer shown. That is because changing the name, label or format in the SELECT statement does not affect the attributes of the column in the permanent dataset, it only affects the output. CREATING AND ALTERING TABLES Saving the results of a query to a table is accomplished by creating a table from the query using the following syntax: PROC SQL; CREATE TABLE new_table_name AS SELECT column_one, column_two,... FROM source_table_name ; QUIT; For example, if we wish to save the results of the results of the query where the number of empty seats was calculated, we could run the following code: CREATE table sql.march2 AS select flight label="flight Number", date, dest, ROUND(miles* ) as kilometers label="flight Distance in Km" format=comma6.0, boarded, capacity - boarded as empty label="number of Empty Seats" format=4.0 from sql.march ; title New Table: MARCH2; select * from sql.march2 ; The following is the message printed in the log and a partial listing of the output: NOTE: Table SQL.MARCH2 created, with 46 rows and 6 columns. New Table: MARCH2 Number Flight of Flight Distance Empty Number date dest in Km boarded Seats MAR94 LAX 3, MAR94 ORD 1, MAR94 LON 5, MAR94 FRA 6, MAR94 YYZ You can create a new table that is empty, i.e. a table with no records, by copying the structure of an existing table or by specifying the column names, labels, formats and data types. Suppose you need to create a table identical to the MARCH table so that data for the month of April can be entered. It is very easy to copy the structure of an existing table. The following PROC SQL code will create a table named APRIL with all the same columns as MARCH, but without any records in the table: CREATE table work.april LIKE sql.march ; describe table work.april; 5

6 There will not be any output generated by the code above, but the log will show that the table was created with zero records. The DESCRIBE TABLE statement will print a list of the variables in the table to the log. Here are the log contents for the above code: 357 create table work.april like sql.march; NOTE: Table WORK.APRIL created, with 0 rows and 8 columns. 358 describe table work.april; NOTE: SQL table WORK.APRIL was created like: create table WORK.APRIL( bufsize=8192 ) ( flight char(3), date num format=date7. informat=date7., depart num format=time5. informat=time5., orig char(3), dest char(3), miles num, boarded num, capacity num ); Notice that the APRIL table has the same columns as the MARCH table. We can create a complete copy of the MARCH table, i.e. the structure and the data, as follows: create table sql.march_copy as select * from sql.march ; The following is printed in the log: NOTE: Table SQL.MARCH_COPY created, with 46 rows and 8 columns. and the reader can easily verify that the dataset is in fact a copy. Another way to create a table is to specify each column and its attributes. Let s create a lookup table for the airport codes that are stored in the variable named dest in the MARCH table: create table sql.airport_lu (airport_code char(3) label= Airport Code, city char(40) label= City, country char(40) label= Country ); describe table sql.airport_lu; Examining the log shows that the table was created to the specifications given and has no records: NOTE: Table SQL.AIRPORT_LU created, with 0 rows and 3 columns. 10 describe table sql.airport_lu; NOTE: SQL table SQL.AIRPORT_LU was created like: create table SQL.AIRPORT_LU( bufsize=8192 ) ( airport_code char(3) label='airport Code', city char(40) label='city', country char(40) label='country' ); 6

7 Now suppose we wanted to construct a lookup table that would indicate whether a flight was an international or a domestic flight. We already know that we need such a table must include the flight number and at least one column that will categorize the flight as domestic or international. We can start by creating a table that has the list of flight numbers in it as follows: create table sql.flight_type_lu as select distinct flight as flight_number from sql.march ; title Flight Numbers in MARCH; select * from sql.flight_type_lu; The output from the above PROC SQL code is: Flight Numbers in MARCH flight_number Now that we can create a table, we need to be able to make changes to the table by adding or deleting or modifying columns and rows. ALTERING TABLES, INSERTING AND UPDATING ROWS The ALTER TABLE, INSERT INTO and UPDATE statements are used to modify tables and the data stored in them. The AIRPORT_LU table we created above has three columns: airport_code, city and country. If this lookup table is to be useful, we need to populate it with some data. LAX is the airport code for the Los Angeles International Airport. To add this information to the lookup table we need to INSERT a row: INSERT INTO sql.airport_lu (city, country, airport_code) VALUES ("Los Angeles, CA", "USA", "LAX"); select * from sql.airport_lu ; The following is a partial listing of the log and output: NOTE: 1 row was inserted into SQL.AIRPORT_LU. Airport Code City Country LAX Los Angeles, CA USA Notice that the order in which the column names are listed in the INSERT INTO statement does not have to follow any particular order as long as the VALUES order matches. Not all columns have to be listed in the INSERT INTO statement: if a column is not listed, then it will receive whatever the default missing value is for that data type. To illustrate this, lets add a record for San Diego, California, to the look up table: INSERT INTO sql.airport_lu (city, country) VALUES ("San Diego, CA", "USA"); select * from sql.airport_lu ; 7

8 Output: Airport Code City Country LAX Los Angeles, CA USA San Diego, CA USA Notice that there is no code for San Diego since we did not include it in the INSERT INTO statement. We can UPDATE the AIRPORT_LU table to add the code for SAN for San Diego: UPDATE sql.airport_lu set airport_code = SAN where city = "San Diego, CA"; select * from sql.airport_lu ; Output: Airport Code City Country LAX Los Angeles, CA USA SAN San Diego, CA USA The ALTER TABLE statement is used to delete or add a column to a table. To delete a column from a table, use the following syntax: ALTER TABLE table_name DROP column_one, column_two,... ; To ADD a column to a table use the following syntax: ALTER TABLE table_name ADD column_name <column specifications> ; where column specifications consist of the column type, label and format. To ADD a column or field to the AIRPORT_LU table, we run the following code: ALTER TABLE sql.airport_lu ADD dom_or_int char(13) format=$13. label= Domestic or International, n_gates num format=3.0 label="number of Gates" ; which results in the following message printed in the Log: NOTE: Table SQL.AIRPORT_LU has been modified, with 5 columns. This will allow us to classify each airport as Domestic or International and to enter the number of gates available at each. We can add this information to the table with the following code: UPDATE sql.airport_lu set dom_or_int = "Domestic", n_gates = 40 where airport_code = "LAX" ; 8

9 title AIRPORT_LU with added columns ; select * from sql.airport_lu ; which results in the following message printed in the Log: NOTE: 1 row was updated in SQL.AIRPORT_LU. And the following output: AIRPORT_LU with added columns Number Airport Domestic or of Code City Country International Gates LAX Los Angeles, CA USA Domestic 40 SAN San Diego, CA USA. Note that for the SAN airport record, there is a., or missing value, for n_gates (number of gates) and there is no data for dom_or_int (domestic or international) because the UPDATE statement had a WHERE clause restricting the update to LAX. (This output has been edited to fit in the space above.) To delete or DROP a column from a table we use the following syntax: ALTER TABLE table_name DROP column_name ; To delete the columns country and dom_or_int from the AIRPORT_LU table we run the following code: ALTER TABLE sql.airport_lu DROP country, dom_or_int; title AIRPORT_LU after DROPing country and dom_or_int columns; select * from sql.airport_lu ; which results in the following message printed in the Log: NOTE: Table SQL.AIRPORT_LU has been modified, with 3 columns. And the following output: AIRPORT_LU after DROPing country and dom_or_int columns Number Airport of Code City Gates LAX Los Angeles, CA 40 SAN San Diego, CA. DELETING ROWS AND TABLES The DELETE statement is used with a WHERE clause to delete one or more records from a table. To delete the record for San Diego from the AIRPORT_LU table, we run the following code : DELETE from sql.airport_lu WHERE airport_code = "SAN"; 9

10 The following message is printed in the log : NOTE: 1 row was deleted from SQL.AIRPORT_LU. CAUTION: Be careful when using the DELETE statement because if you use the DELETE statement without the WHERE clause, all the records will be deleted. To delete an entire table, use the following syntax: DROP TABLE table_name ; To delete the AIRPORT_LU table that we created and modified, we run the following code: DROP table sql.airport_lu; describe table sql.airport_lu; which results in the following messages printed in the Log: 370 DROP TABLE sql.airport_lu ; NOTE: Table SQL.AIRPORT_LU has been dropped describe table sql.airport_lu ; ERROR: File SQL.AIRPORT_LU.DATA does not exist SUMMARY FUNCTIONS To summarize data, that is to produce a statistical summary of the entire table in the SELECT clause we must use summary functions such as COUNT, SUM, MIN and MAX to name a few. To create summaries for sub-groups, a GROUP BY clause must be used in the select statement. If GROUP BY is not used with a summary function, then all the rows in the table or view are considered to be a single group and the result of the SELECT statement will be one or more summary statistics computed from all the data. The table below lists some of the summary functions more commonly used in PROC SQL. Consult the PROC SQL documentation for other available summary functions. Summary Function AVG, MEAN COUNT, COUNT(DISTINCT), FREQ, N SUM MAX MIN STD NMISS, NMISS(DISTINCT colname) Function Result means or average of values number of nonmissing values sum of values largest value smallest value standard deviation number of missing values Any column which exists in a table named in the FROM clause of the SELECT statement can be used as an argument in the functions shown in the table above. If the function is used with a single argument, or column name, the function is applied to the column producing one summary statistic for the entire select statement, or one summary statistic for each group in the GROUP BY clause. If the function is used with two or more arguments, the function is applied to the row producing one summary statistic for each row. Suppose we want to calculate the average number of passengers that boarded flights in the MARCH2 table. We can run the following code: 10

11 title Average Number of Passengers; select AVG(boarded) as boarded_avg from sql.march2; which will yield the following result: Average Number of Passengers boarded_avg Now to get the average number of passengers for each flight we just have to add the column flight to the SELECT statement and a GROUP BY clause as follows: title Average Number of Passengers By Flight; select flight, AVG(boarded) as boarded_avg label= Average Number of Passengers format=$6.1 from sql.march2 GROUP BY flight; The resulting output is: Average Number of Passengers By Flight Average Flight Number of Number Passengers Let s examine what would happen if we did not use the GROUP BY clause in the previous example. The following are the log contents and a partial listing of the output obtained by running the previous PROC SQL program with the GROUP BY clause removed : 176 select flight, 177 AVG(boarded) as boarded_avg label="average Number of Passengers" format= from sql.march2 ; NOTE: The query requires remerging summary statistics back with the original data. Average Number of Passengers By Flight: NO 'GROUP BY' Average Flight Number of Number Passengers

12 Because the column flight was included in the SELECT statement, but there was no GROUP BY flight clause specified, the query wants to give us two things: all the flight numbers from all the records and the average of the column boarded calculated for the entire table. Therefore, the query calculates the average of boarded and attaches it to the original data and gives us the results above. In the MARCH2 table, we have the columns boarded and empty which are the number of passengers and number of empty seats respectively. When we created MARCH2 we did not include capacity in the SELECT statement and so the number of seats on the plane is not included in the table. We can compute capacity by summing boarded and empty. Recall that if a summary function has 2 or more arguments it will act across the columns on each row, whereas a summary function with only one argument will act across the rows and on the column which is the argument. Consider the following code: select flight, boarded, empty, sum(empty) as empty_sum, sum(boarded, empty) as capacity, min(boarded, sum(boarded, empty) ) as min_x label="min of Boarded and Capacity" from sql.march2 ; SUM(empty) will sum the number of empty seats for all records in the table, but since there is no BY GROUP clause, this single value will be re-merged with all the records of the table. SUM(boarded,empty) will sum the fields boarded and empty for each record resulting in the capacity for each flight. Finally, MIN( boarded, SUM( boarded, empty ) ) will compute the minimum of boarded and the sum of boarded and empty, or the capacity, for each record. This last example demonstrates that one summary function can be used as an argument to another summary function. The following is a partial listing of the log and output: Another Summary Function Example Number Min of of Boarded Flight Empty and Number boarded Seats empty_sum capacity Capacity We can now generate some reports that summarize the data in a table as in the following example: proc sql ; title MARCH: Flight Loads; select flight as flt label = "Flight Number", dest as dst label = "Flight Destination", MEAN(boarded) as occup_avg label = "Average Occupancy" format=8.1, MIN(boarded) as occup_min label = "Minimum Occupancy", MAX(boarded) as occup_min label = "Maximum Occupancy", MIN(capacity - boarded) from sql.march Group By flight, dest; quit ; as empty_min label = "Minimum Number of Empty Seats" 12

13 The output for this example is: MARCH: Flight Loads Minimum Number Flight Flight Average Minimum Maximum of Empty Number Destination Occupancy Occupancy Occupancy Seats LAX YYZ ORD LON PAR WAS FRA HAVING: FILTERING GROUPED DATA The HAVING clause is used following a GROUP BY clause to filter grouped data. HAVING acts on the grouped or aggregated data in contrast to WHERE which acts on the individual rows of the table. You can use summary functions with HAVING, but you can not use a summary function with the WHERE clause if the summary function is aggregating data across rows. In the preceding example, we grouped the data by flight number and destination and computed some summary statistics for each group. If we wanted to include in our report only those flights where the minimum number of empty seats exceeded 40, we would add a HAVING clause to the code as follows: title HAVING: Filtering Grouped Data - Flights with a Minimum of 40 Empty Seats; select flight as flt label = "Flight Number", dest as dst label = "Flight Destination", MEAN(boarded) as occup_avg label = "Average Occupancy" format=8.1, MIN(boarded) as occup_min label = "Minimum Occupancy", MAX(boarded) as occup_min label = "Maximum Occupancy", MIN(capacity - boarded) as empty_min label = "Minimum Number of Empty Seats" from sql.march Group By flight, dest HAVING empty_min > 40 ; All the flights whose minimum number of empty seats was less than 40 have been excluded from the output which follows: HAVING: Filtering Grouped Data - Flights with a Minimum of 40 Empty Seats Minimum Number Flight Flight Average Minimum Maximum of Empty Number Destination Occupancy Occupancy Occupancy Seats PAR WAS The results would have been quite different if we had used a WHERE clause instead of a HAVING clause. Consider the following PROC SQL query where the HAVING clause has been replaced by a WHERE clause: title Trying to Filter Grouped Data with WHERE; 13

14 select flight as flt label = "Flight Number", dest as dst label = "Flight Destination", MEAN(boarded) as occup_avg label = "Average Occupancy" format=8.1, MIN(boarded) as occup_min label = "Minimum Occupancy", MAX(boarded) as occup_min label = "Maximum Occupancy", MIN(capacity - boarded) as empty_min label = "Minimum Number of Empty Seats" from sql.march where capacity-boarded >40 Group By flight, dest; As the following output shows, this query will return grouped data after eliminating individual flight records where the number of empty seats, capacity-boarded, was greater than 40. Trying to Filter Grouped Data with WHERE Minimum Number Flight Flight Average Minimum Maximum of Empty Number Destination Occupancy Occupancy Occupancy Seats LAX YYZ ORD LON PAR WAS FRA COMBINING DATA FROM DIFFERENT TABLES Often times we will need to combine or select data from different tables. We will introduce some of the common ways to accomplish this with PROC SQL. OUTER UNION CORR: APPENDING TABLES OR QUERY RESULTS Concatenating, or appending, two or more tables or query results can be accomplished by placing the set operator OUTER UNION CORR between the queries whose results are to be concatenated. OUTER UNION CORR is a set operator which will append query results by combining the records from both queries and will align columns that are of the same name and type. If OUTER UNION is used without CORR, then none of the columns will be aligned and the result of the query will have a total number of columns equal to the sum of the number of columns in each of the queries being concatenated. As an illustration, let us append data from a table named APRIL to the data from the MARCH table. The APRIL table has the same columns and same type of flight information as the MARCH table. To limit the output we will add a WHERE clause to select only flights 114 and 219 and we will rename one column in one of the queries. The following PROC SQL code will accomplish this: proc sql ; title OUTER UNION - Concatenating MARCH and APRIL without CORR; select flight, date, boarded from sql.march where flight IN ("114","219") OUTER UNION select flight, date, boarded as n_pasengers from sql2.april where dest LIKE "L%" /* select only flights where dest begins with L */ ; 14

15 A partial listing of the output shows that none of the columns were aligned: OUTER UNION - Concatenating MARCH and APRIL flight date boarded flight date n_passengers MAR MAR MAR MAR APR APR APR APR Notice that the rows of APRIL have been appended to the rows of MARCH and that there are separate columns for the columns from MARCH and those from APRIL. To align the columns in the result, we modify the code by adding CORR to the OUTER UNION set operator: proc sql ; title OUTER UNION CORR- Concatenating MARCH and APRIL; select flight, date, boarded from sql.march where flight IN ("114","219") OUTER UNION CORR select flight, date, boarded as n_passengers from sql2.april where dest LIKE "L%" /* select only flights where dest begins with "L" */; The following is a partial listing of the output : OUTER UNION CORR- Concatenating MARCH and APRIL flight date boarded n_passengers MAR MAR MAR APR APR APR The columns flight and date have been aligned, but the columns for boarded have not been aligned. If you look at the code for this example, you will notice that in the query selecting data from APRIL, the column boarded was renamed to n_passengers, and is no longer an exact match for boarded from MARCH. Therefore, boarded and n_passengers are not aligned. Technically, OUTER UNION CORR is operating on the results of queries, not directly on tables. However, if we add a CREATE TABLE clause to the code, the concatenated records shown in the output would be saved to a new table, and we would have effectively concatenated or appended the APRIL table to the MARCH table. Other set operators which you may find useful are UNION, EXCEPT and INTERSECT. These are all documented in the SQL Procedure User s Guide available on the web at support.sas.com. JOINING TABLES OR QUERY RESULTS Joining tables in PROC SQL is similar to merging datasets in the data step. Usually, tables are joined based on one or more common columns. When joining tables, you must specify a join condition. That is you must tell Proc SQL what column(s) or 15

16 field(s) to use in order to match rows in one table to the corresponding rows in the other table(s). Tables that have no common column can also be joined if necessary. Suppose we have tables A and B from a hospital database. Table A contains patient information for patients that visited the emergency room of a hospital: name, address, gender, date of birth, patient account number, etc Table B contains information on surgical procedures performed: type of surgery, date of surgery, patient account number etc What do tables A and B have in common? They both include a patient account number for each record. Each patient coming to the emergency room will be represented once (per visit) in table A. Any patient in the hospital that has had any surgery will be represented in table B with one record per surgical procedure performed. Not all patients having surgery come through the emergency room, and not all patients that come to the emergency room have surgery. Therefore, you can see that not all patients in table A will have records in table B and vice versa. Tables A and B have at least one column in common and some records in each table belong to the same observational units, i.e. have the same value of the common column A B INNER JOINS If we want to get a list of the patients who came to the emergency room and had surgery, we need to get the records identified by the purple intersection in the figure above. An INNER JOIN of A and B results in selecting rows which have the same value of the common column in each table. A B OUTER JOINS There are three types of Outer Joins: 1. Left Outer Joins 2. Right Outer Joins 3. Full Outer Joins LEFT OUTER JOINS Suppose we wanted to get a list of all patients who came to the emergency room (A) and information about their surgery history (B) if any, we need all the rows in A and any rows in B whose patient id match a patient id in A. RIGHT OUTER JOINS If we wanted to get a list of all patients who had surgical procedures (B) and information about their emergency room visit (A), if any, we need all the rows in B and any rows in A whose patient id match a patient id in B. All rows from A Some Rows from B Some rows from A All rows from B Left Outer Join Minimum number of rows: Right Outer Join Minimum number of rows: 16

17 FULL OUTER JOINS If we wanted a complete listing of all patients who came to the emergency room (A) OR had surgery (B) then we would use a FULL OUTER JOIN. Full Outer Join A A B B All rows from A and all rows from B. Minimum number of rows: In general, the three types of Outer Joins yield the following results: 1. Left: selects all the rows in the left table and any associated rows from the right table 2. Right: selects all the rows in the right table and any associated rows from the left table 3. Full: selects all rows from both tables Returning to the flight data, the reader will recall that the MARCH table had a flight number and a destination code. But unless we have memorized all the codes for all the airports, those destination codes are not very descriptive. Now recall that we were constructing a lookup table AIRPORT_LU which had the city and country for each airport code. We can join the two tables so that our query results can include the destination city as well as the airport code. The airport code is the common column to both tables, so MARCH will be joined to AIRPORT_LU using the airport code in an INNER JOIN as follows: proc sql ; title INNER JOIN ; select A.flight, A.dest, B.city, sum(a.boarded) as passenger_tot label="total Number of Passengers" format=comma9.0 from sql.march as A, sql2.airport_lu as B WHERE A.dest = B.code group by A.flight, A.dest, B.city ; Running the above code creates the following output: INNER JOIN Total Number of flight dest City Where Airport is Located Passengers LAX Los Angeles, CA 1, YYZ Toronto, ON ORD Chicago, IL FRA Frankfurt 1,095 The first thing one should notice in the output is that while the city has been added to the output, not all the airport codes from MARCH are listed. The obvious thing to do would be to check the AIRPORT_LU table and see if any of our airport codes are missing from that table. But another way of would be to use a LEFT OUTER JOIN with MARCH listed on the left so that all the airport destination codes from MARCH and any matching codes from AIRPORT_LU would be listed. Another thing the reader may have noticed is that in the PROC SQL code we have assigned ALIASes A and B for tables MARCH and AIRPORT_LU respectively. The aliases make it possible to specify which table each column in the SELECT statement will come from since we are now dealing with more than one table. The following code sets up a LEFT OUTER JOIN for our query: 17

18 proc sql ; title LEFT OUTER JOIN ; select A.dest, B.city, sum(a.boarded) as passenger_tot label="total Number of Passengers" format=comma9.0 from sql.march as A LEFT OUTER JOIN sql2.airport_lu as B ON A.dest = B.code Group by A.dest, B.city ; Running the above code creates the following output: LEFT OUTER JOIN Total Number of dest City Where Airport is Located Passengers FRA Frankfurt 1,095 LAX Los Angeles, CA 1,071 LON 1,338 ORD Chicago, IL 931 PAR 867 WAS 622 YYZ Toronto, ON 884 All records from MARCH are selected and any matching information from AIRPORT_LU has been selected. Since there seems to be no city listed for codes LON, PAR and WAS, these codes are either not in the AIRPORT_LU table, or they are not valid codes. A RIGHT OUTER JOIN with AIRPORT_LU on the right would yield all the cities in the lookup table and any matching codes from MARCH and it is left as an exercise for the reader. MACRO INTERFACE TO PROC SQL It is very easy to store SQL query results in macro variables which can be used later to enhance our output or for other reasons. We can store individual values in a macro variable, and we can store the values of several records as a delimited string in one macro variable. This section is not about teaching the user about macro variables. The purpose here is to give the user a simple tool to use. If we wanted to enhance our reports with titles that included the beginning and ending dates of the period covered in the data, as well as a list of the destinations, we could run two queries to get the information and then hard code it into the titles, or we can use the macro interface to store the information in macro variables so that we can re-use the code without manually editing the titles. The following code shows how to store the earliest and latest dates from the MARCH table into 2 macro variables and also how to store a list of the destination codes separated by commas into one macro variable. proc sql NOPRINT; /* NOPRINT will suppress any output */ select min(date) as start_date format = DATE7., max(date) as end_date format = DATE7. into :start_date, :end_date from sql.march; %PUT START_DATE: &start_date --- END_DATE: &end_date; /* check the LOG */ select unique(dest) as destination into :destination separated by "," from sql.march ; %PUT Destination List: &destination; /* check the LOG */ The log will show the following: 852 %PUT START_DATE: &start_date --- END_DATE: &end_date; START_DATE: 01MAR END_DATE: 07MAR94 18

19 858 %PUT Destination List: &destination; Destination List: FRA,LAX,LON,ORD,PAR,WAS,YYZ PUTTING IT ALL TOGETHER The following is the code to create a summary report from the MARCH flight data table. This example combines most of the features and options of PROC SQL introduced in this tutorial. proc sql NOPRINT; select min(a.date) as start_date format = DATE7., max(a.date) as end_date format = DATE7. into :start_date, :end_date from sql.march as A; select unique(a.city) as destination into :destination separated by " - " /* use - to delimit the list */ from sql2.airport_lu as A ; proc sql ; title "FLIGHT REPORT FOR: &start_date - &end_date" ; title2 "Destinations: "; title3 "&destination" ; select B.city as dst label = "Flight Destination" format=$30., COUNT(distinct A.flight) as nd_flight label="number of Flights" format=comma6.0, COUNT(A.flight) as n_flight label="total Number of Flights" format=comma6.0, MEAN(A.boarded) as occup_avg label = "Average Occupancy" format=8.1, SUM(A.boarded) as occup_min label = "Total Number of Passengers" format=comma9.0 from sql.march as A LEFT OUTER JOIN sql2.airport_lu as B ON A.dest = B.code Group By A.flight, B.city OUTER UNION CORR select "ALL Cities" as dst label = "Flight Destination" format=$30., COUNT(distinct A.flight) as nd_flight label="number of Flights" format=comma6.1, COUNT(A.flight) as n_flight label="total Number of Flights" format=comma6.1, MEAN(boarded) as occup_avg label = "Average Occupancy" format=8.1, SUM(boarded) as occup_min label = "Total Number of Passengers" format=comma9.0 from sql.march as A LEFT OUTER JOIN sql2.airport_lu as B ON A.dest = B.code; quit ; Our report is now shown below as it appears in the output window. FLIGHT REPORT FOR: 01MAR94-07MAR94 Destinations: Chicago, IL - Frankfurt - London - Los Angeles, CA - Paris - Toronto, ON - Washington DC Total Number Number Total of of Average Number of Flight Destination Flights Flights Occupancy Passengers Chicago, IL Frankfurt ,095 London ,338 Los Angeles, CA ,071 Paris Toronto, ON Washington DC ALL Cities ,808 19

20 A basic ODS statement will output this table as an RTF file which can be opened in any standard word processing program. CONCLUSION Many aspects of PROC SQL, including most of the syntax required to perform data management and reporting tasks, have been introduced in this tutorial. SAS users new to the SQL procedure should have gained a good understanding of the different tasks that can be performed with this procedure. The reader now should be familiar enough with PROC SQL to be able to search the documentation for options, functions, clauses and statements that will help them solve problems of greater complexity. The material presented in this tutorial should serve as a springboard off of which the SAS user can dive right into PROC SQL and not only manage to stay afloat, but also to get the results they seek. REFERENCES SAS 9.1 SQL Procedure User s Guide. Cary, NC: SAS Institute Inc., Lafler, Kirk Paul PROC SQL: Beyond the Basics Using SAS. Cary, NC: SAS Institute Inc. CONTACT INFORMATION If you would like a copy of the code and datasets used in this tutorial, please send me an with your request. Your comments and questions are valued and encouraged. Contact the author at: Richard Severino Convergence CT 1132 Bishop Street Suite 615 Honolulu HI rseverino@convergencect.com, severino@hawaii.edu SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. 20

Overview of Data Management Tasks (command file=datamgt.sas)

Overview of Data Management Tasks (command file=datamgt.sas) Overview of Data Management Tasks (command file=datamgt.sas) Create the March data set: To create the March data set, you can read it from the MARCH.DAT raw data file, using a data step, as shown below.

More information

An SQL Tutorial Some Random Tips

An SQL Tutorial Some Random Tips An SQL Tutorial Some Random Tips Presented by Jens Dahl Mikkelsen SAS Institute A/S Author: Paul Kent SAS Institute Inc, Cary, NC. Short Stories Towards a Better UNION Outer Joins. More than two too. Logical

More information

CSC Web Programming. Introduction to SQL

CSC Web Programming. Introduction to SQL CSC 242 - Web Programming Introduction to SQL SQL Statements Data Definition Language CREATE ALTER DROP Data Manipulation Language INSERT UPDATE DELETE Data Query Language SELECT SQL statements end with

More information

BASICS BEFORE STARTING SAS DATAWAREHOSING Concepts What is ETL ETL Concepts What is OLAP SAS. What is SAS History of SAS Modules available SAS

BASICS BEFORE STARTING SAS DATAWAREHOSING Concepts What is ETL ETL Concepts What is OLAP SAS. What is SAS History of SAS Modules available SAS SAS COURSE CONTENT Course Duration - 40hrs BASICS BEFORE STARTING SAS DATAWAREHOSING Concepts What is ETL ETL Concepts What is OLAP SAS What is SAS History of SAS Modules available SAS GETTING STARTED

More information

Advanced SQL Processing Prepared by Destiny Corporation

Advanced SQL Processing Prepared by Destiny Corporation Advanced SQL Processing Prepared by Destiny Corporation Summary Functions With a single argument, but with other selected columns, the function gives a result for all the rows, then merges the back with

More information

Using PROC SQL to Calculate FIRSTOBS David C. Tabano, Kaiser Permanente, Denver, CO

Using PROC SQL to Calculate FIRSTOBS David C. Tabano, Kaiser Permanente, Denver, CO Using PROC SQL to Calculate FIRSTOBS David C. Tabano, Kaiser Permanente, Denver, CO ABSTRACT The power of SAS programming can at times be greatly improved using PROC SQL statements for formatting and manipulating

More information

Database Management Systems,

Database Management Systems, Database Management Systems SQL Query Language (3) 1 Topics Aggregate Functions in Queries count sum max min avg Group by queries Set Operations in SQL Queries Views 2 Aggregate Functions Tables are collections

More information

David Ghan SAS Education

David Ghan SAS Education David Ghan SAS Education 416 307-4515 David.ghan@sas.com Using SQL in SAS Victoria Area SAS User Group February 12, 2004 1. What is SQL? 2. Coding an SQL Query 3. Advanced Examples a. Creating macro variables

More information

Contents of SAS Programming Techniques

Contents of SAS Programming Techniques Contents of SAS Programming Techniques Chapter 1 About SAS 1.1 Introduction 1.1.1 SAS modules 1.1.2 SAS module classification 1.1.3 SAS features 1.1.4 Three levels of SAS techniques 1.1.5 Chapter goal

More information

Using the SQL Editor. Overview CHAPTER 11

Using the SQL Editor. Overview CHAPTER 11 205 CHAPTER 11 Using the SQL Editor Overview 205 Opening the SQL Editor Window 206 Entering SQL Statements Directly 206 Entering an SQL Query 206 Entering Non-SELECT SQL Code 207 Creating Template SQL

More information

Using Recursion for More Convenient Macros

Using Recursion for More Convenient Macros Paper BB-04 Using Recursion for More Convenient Macros Nate Derby, Stakana Analytics, Seattle, WA ABSTRACT There are times when a macro needs to alternatively be applied to either one value or a list of

More information

ET01. LIBNAME libref <engine-name> <physical-file-name> <libname-options>; <SAS Code> LIBNAME libref CLEAR;

ET01. LIBNAME libref <engine-name> <physical-file-name> <libname-options>; <SAS Code> LIBNAME libref CLEAR; ET01 Demystifying the SAS Excel LIBNAME Engine - A Practical Guide Paul A. Choate, California State Developmental Services Carol A. Martell, UNC Highway Safety Research Center ABSTRACT This paper is a

More information

STRUCTURED QUERY LANGUAGE (SQL)

STRUCTURED QUERY LANGUAGE (SQL) STRUCTURED QUERY LANGUAGE (SQL) EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY SQL TIMELINE SCOPE OF SQL THE ISO SQL DATA TYPES SQL identifiers are used

More information

SQL-Server. Insert query in SQL Server. In SQL Server (Transact-SQL), the INSERT statement is used to

SQL-Server. Insert query in SQL Server. In SQL Server (Transact-SQL), the INSERT statement is used to Insert query in SQL Server In SQL Server (Transact-SQL), the INSERT statement is used to insert a data into the table. It can be a single record or multiple records into a table in SQL Server. The INSERT

More information

STATION

STATION ------------------------------STATION 1------------------------------ 1. Which of the following statements displays all user-defined macro variables in the SAS log? a) %put user=; b) %put user; c) %put

More information

Introduction to PROC SQL

Introduction to PROC SQL Introduction to PROC SQL Steven First, Systems Seminar Consultants, Madison, WI ABSTRACT PROC SQL is a powerful Base SAS Procedure that combines the functionality of DATA and PROC steps into a single step.

More information

Relational Database Management Systems for Epidemiologists: SQL Part I

Relational Database Management Systems for Epidemiologists: SQL Part I Relational Database Management Systems for Epidemiologists: SQL Part I Outline SQL Basics Retrieving Data from a Table Operators and Functions What is SQL? SQL is the standard programming language to create,

More information

INTRODUCTION TO SAS HOW SAS WORKS READING RAW DATA INTO SAS

INTRODUCTION TO SAS HOW SAS WORKS READING RAW DATA INTO SAS TO SAS NEED FOR SAS WHO USES SAS WHAT IS SAS? OVERVIEW OF BASE SAS SOFTWARE DATA MANAGEMENT FACILITY STRUCTURE OF SAS DATASET SAS PROGRAM PROGRAMMING LANGUAGE ELEMENTS OF THE SAS LANGUAGE RULES FOR SAS

More information

PROC SQL vs. DATA Step Processing. T Winand, Customer Success Technical Team

PROC SQL vs. DATA Step Processing. T Winand, Customer Success Technical Team PROC SQL vs. DATA Step Processing T Winand, Customer Success Technical Team Copyright 2012, SAS Institute Inc. All rights reserved. Agenda PROC SQL VS. DATA STEP PROCESSING Comparison of DATA Step and

More information

Getting it Done with PROC TABULATE

Getting it Done with PROC TABULATE ABSTRACT Getting it Done with PROC TABULATE Michael J. Williams, ICON Clinical Research, San Francisco, CA The task of displaying statistical summaries of different types of variables in a single table

More information

Ten Great Reasons to Learn SAS Software's SQL Procedure

Ten Great Reasons to Learn SAS Software's SQL Procedure Ten Great Reasons to Learn SAS Software's SQL Procedure Kirk Paul Lafler, Software Intelligence Corporation ABSTRACT The SQL Procedure has so many great features for both end-users and programmers. It's

More information

Ditch the Data Memo: Using Macro Variables and Outer Union Corresponding in PROC SQL to Create Data Set Summary Tables Andrea Shane MDRC, Oakland, CA

Ditch the Data Memo: Using Macro Variables and Outer Union Corresponding in PROC SQL to Create Data Set Summary Tables Andrea Shane MDRC, Oakland, CA ABSTRACT Ditch the Data Memo: Using Macro Variables and Outer Union Corresponding in PROC SQL to Create Data Set Summary Tables Andrea Shane MDRC, Oakland, CA Data set documentation is essential to good

More information

MariaDB Crash Course. A Addison-Wesley. Ben Forta. Upper Saddle River, NJ Boston. Indianapolis. Singapore Mexico City. Cape Town Sydney.

MariaDB Crash Course. A Addison-Wesley. Ben Forta. Upper Saddle River, NJ Boston. Indianapolis. Singapore Mexico City. Cape Town Sydney. MariaDB Crash Course Ben Forta A Addison-Wesley Upper Saddle River, NJ Boston Indianapolis San Francisco New York Toronto Montreal London Munich Paris Madrid Cape Town Sydney Tokyo Singapore Mexico City

More information

A Quick and Gentle Introduction to PROC SQL

A Quick and Gentle Introduction to PROC SQL ABSTRACT Paper B2B 9 A Quick and Gentle Introduction to PROC SQL Shane Rosanbalm, Rho, Inc. Sam Gillett, Rho, Inc. If you are afraid of SQL, it is most likely because you haven t been properly introduced.

More information

Chapter 2. Performing Advanced Queries Using PROC SQL

Chapter 2. Performing Advanced Queries Using PROC SQL Chapter 2 Performing Advanced Queries Using PROC SQL 1 Displaying All Columns To select all columns included in a table use one of two options List all variables from the table in the select clause The

More information

Data Manipulation Language (DML)

Data Manipulation Language (DML) In the name of Allah Islamic University of Gaza Faculty of Engineering Computer Engineering Department ECOM 4113 DataBase Lab Lab # 3 Data Manipulation Language (DML) El-masry 2013 Objective To be familiar

More information

Subquery: There are basically three types of subqueries are:

Subquery: There are basically three types of subqueries are: Subquery: It is also known as Nested query. Sub queries are queries nested inside other queries, marked off with parentheses, and sometimes referred to as "inner" queries within "outer" queries. Subquery

More information

Quality Control of Clinical Data Listings with Proc Compare

Quality Control of Clinical Data Listings with Proc Compare ABSTRACT Quality Control of Clinical Data Listings with Proc Compare Robert Bikwemu, Pharmapace, Inc., San Diego, CA Nicole Wallstedt, Pharmapace, Inc., San Diego, CA Checking clinical data listings with

More information

An Introduction to PROC SQL. David Beam Systems Seminar Consultants, Inc. - Madison, WI

An Introduction to PROC SQL. David Beam Systems Seminar Consultants, Inc. - Madison, WI An Introduction to PROC SQL David Beam Systems Seminar Consultants, Inc. - Madison, WI Abstract PROC SQL is a powerful Base SAS PROC which combines the functionality of the DATA and PROC Steps into a single

More information

NULLs & Outer Joins. Objectives of the Lecture :

NULLs & Outer Joins. Objectives of the Lecture : Slide 1 NULLs & Outer Joins Objectives of the Lecture : To consider the use of NULLs in SQL. To consider Outer Join Operations, and their implementation in SQL. Slide 2 Missing Values : Possible Strategies

More information

From An Introduction to SAS University Edition. Full book available for purchase here.

From An Introduction to SAS University Edition. Full book available for purchase here. From An Introduction to SAS University Edition. Full book available for purchase here. Contents List of Programs... xi About This Book... xvii About the Author... xxi Acknowledgments... xxiii Part 1: Getting

More information

SAS (Statistical Analysis Software/System)

SAS (Statistical Analysis Software/System) SAS (Statistical Analysis Software/System) Clinical SAS:- Class Room: Training Fee & Duration : 23K & 3 Months Online: Training Fee & Duration : 25K & 3 Months Learning SAS: Getting Started with SAS Basic

More information

INTRODUCTION TO PROC SQL JEFF SIMPSON SYSTEMS ENGINEER

INTRODUCTION TO PROC SQL JEFF SIMPSON SYSTEMS ENGINEER INTRODUCTION TO PROC SQL JEFF SIMPSON SYSTEMS ENGINEER THE SQL PROCEDURE The SQL procedure: enables the use of SQL in SAS is part of Base SAS software follows American National Standards Institute (ANSI)

More information

Set theory is a branch of mathematics that studies sets. Sets are a collection of objects.

Set theory is a branch of mathematics that studies sets. Sets are a collection of objects. Set Theory Set theory is a branch of mathematics that studies sets. Sets are a collection of objects. Often, all members of a set have similar properties, such as odd numbers less than 10 or students in

More information

Querying Data with Transact SQL

Querying Data with Transact SQL Course 20761A: Querying Data with Transact SQL Course details Course Outline Module 1: Introduction to Microsoft SQL Server 2016 This module introduces SQL Server, the versions of SQL Server, including

More information

An Introduction to PROC REPORT

An Introduction to PROC REPORT Paper BB-276 An Introduction to PROC REPORT Kirk Paul Lafler, Software Intelligence Corporation, Spring Valley, California Abstract SAS users often need to create and deliver quality custom reports and

More information

Full file at

Full file at David Kroenke's Database Processing: Fundamentals, Design and Implementation (10 th Edition) CHAPTER TWO INTRODUCTION TO STRUCTURED QUERY LANGUAGE (SQL) True-False Questions 1. SQL stands for Standard

More information

SQL: Data Querying. B0B36DBS, BD6B36DBS: Database Systems. h p://www.ksi.m.cuni.cz/~svoboda/courses/172-b0b36dbs/ Lecture 4

SQL: Data Querying. B0B36DBS, BD6B36DBS: Database Systems. h p://www.ksi.m.cuni.cz/~svoboda/courses/172-b0b36dbs/ Lecture 4 B0B36DBS, BD6B36DBS: Database Systems h p://www.ksi.m.cuni.cz/~svoboda/courses/172-b0b36dbs/ Lecture 4 SQL: Data Querying Mar n Svoboda mar n.svoboda@fel.cvut.cz 20. 3. 2018 Czech Technical University

More information

Demystifying PROC SQL Join Algorithms

Demystifying PROC SQL Join Algorithms Demystifying PROC SQL Join Algorithms Kirk Paul Lafler, Software Intelligence Corporation, Spring Valley, California ABSTRACT When it comes to performing PROC SQL joins, users supply the names of the tables

More information

T-SQL Training: T-SQL for SQL Server for Developers

T-SQL Training: T-SQL for SQL Server for Developers Duration: 3 days T-SQL Training Overview T-SQL for SQL Server for Developers training teaches developers all the Transact-SQL skills they need to develop queries and views, and manipulate data in a SQL

More information

Concepts of Database Management Eighth Edition. Chapter 2 The Relational Model 1: Introduction, QBE, and Relational Algebra

Concepts of Database Management Eighth Edition. Chapter 2 The Relational Model 1: Introduction, QBE, and Relational Algebra Concepts of Database Management Eighth Edition Chapter 2 The Relational Model 1: Introduction, QBE, and Relational Algebra Relational Databases A relational database is a collection of tables Each entity

More information

PROGRAMMING ROLLING REGRESSIONS IN SAS MICHAEL D. BOLDIN, UNIVERSITY OF PENNSYLVANIA, PHILADELPHIA, PA

PROGRAMMING ROLLING REGRESSIONS IN SAS MICHAEL D. BOLDIN, UNIVERSITY OF PENNSYLVANIA, PHILADELPHIA, PA PROGRAMMING ROLLING REGRESSIONS IN SAS MICHAEL D. BOLDIN, UNIVERSITY OF PENNSYLVANIA, PHILADELPHIA, PA ABSTRACT SAS does not have an option for PROC REG (or any of its other equation estimation procedures)

More information

Get SAS sy with PROC SQL Amie Bissonett, Pharmanet/i3, Minneapolis, MN

Get SAS sy with PROC SQL Amie Bissonett, Pharmanet/i3, Minneapolis, MN PharmaSUG 2012 - Paper TF07 Get SAS sy with PROC SQL Amie Bissonett, Pharmanet/i3, Minneapolis, MN ABSTRACT As a data analyst for genetic clinical research, I was often working with familial data connecting

More information

SQL Data Querying and Views

SQL Data Querying and Views Course A7B36DBS: Database Systems Lecture 04: SQL Data Querying and Views Martin Svoboda Faculty of Electrical Engineering, Czech Technical University in Prague Outline SQL Data manipulation SELECT queries

More information

The REPORT Procedure: A Primer for the Compute Block

The REPORT Procedure: A Primer for the Compute Block Paper TT15-SAS The REPORT Procedure: A Primer for the Compute Block Jane Eslinger, SAS Institute Inc. ABSTRACT It is well-known in the world of SAS programming that the REPORT procedure is one of the best

More information

Oracle Database 10g: Introduction to SQL

Oracle Database 10g: Introduction to SQL ORACLE UNIVERSITY CONTACT US: 00 9714 390 9000 Oracle Database 10g: Introduction to SQL Duration: 5 Days What you will learn This course offers students an introduction to Oracle Database 10g database

More information

Stat Wk 3. Stat 342 Notes. Week 3, Page 1 / 71

Stat Wk 3. Stat 342 Notes. Week 3, Page 1 / 71 Stat 342 - Wk 3 What is SQL Proc SQL 'Select' command and 'from' clause 'group by' clause 'order by' clause 'where' clause 'create table' command 'inner join' (as time permits) Stat 342 Notes. Week 3,

More information

Test Bank for Database Processing Fundamentals Design and Implementation 13th Edition by Kroenke

Test Bank for Database Processing Fundamentals Design and Implementation 13th Edition by Kroenke Test Bank for Database Processing Fundamentals Design and Implementation 13th Edition by Kroenke Link full download: https://testbankservice.com/download/test-bank-fordatabase-processing-fundamentals-design-and-implementation-13th-edition-bykroenke

More information

PREREQUISITES FOR EXAMPLES

PREREQUISITES FOR EXAMPLES 212-2007 SAS Information Map Studio and SAS Web Report Studio A Tutorial Angela Hall, Zencos Consulting LLC, Durham, NC Brian Miles, Zencos Consulting LLC, Durham, NC ABSTRACT Find out how to provide the

More information

SIT772 Database and Information Retrieval WEEK 6. RELATIONAL ALGEBRAS. The foundation of good database design

SIT772 Database and Information Retrieval WEEK 6. RELATIONAL ALGEBRAS. The foundation of good database design SIT772 Database and Information Retrieval WEEK 6. RELATIONAL ALGEBRAS The foundation of good database design Outline 1. Relational Algebra 2. Join 3. Updating/ Copy Table or Parts of Rows 4. Views (Virtual

More information

Statistics, Data Analysis & Econometrics

Statistics, Data Analysis & Econometrics ST009 PROC MI as the Basis for a Macro for the Study of Patterns of Missing Data Carl E. Pierchala, National Highway Traffic Safety Administration, Washington ABSTRACT The study of missing data patterns

More information

10 The First Steps 4 Chapter 2

10 The First Steps 4 Chapter 2 9 CHAPTER 2 Examples The First Steps 10 Invoking the Query Window 11 Changing Your Profile 11 ing a Table 13 ing Columns 14 Alias Names and Labels 14 Column Format 16 Creating a WHERE Expression 17 Available

More information

Relational Database Language

Relational Database Language DATA BASE MANAGEMENT SYSTEMS Unit IV Relational Database Language: Data definition in SQL, Queries in SQL, Insert, Delete and Update Statements in SQL, Views in SQL, Specifying General Constraints as Assertions,

More information

SAS (Statistical Analysis Software/System)

SAS (Statistical Analysis Software/System) SAS (Statistical Analysis Software/System) SAS Analytics:- Class Room: Training Fee & Duration : 23K & 3 Months Online: Training Fee & Duration : 25K & 3 Months Learning SAS: Getting Started with SAS Basic

More information

Interleaving a Dataset with Itself: How and Why

Interleaving a Dataset with Itself: How and Why cc002 Interleaving a Dataset with Itself: How and Why Howard Schreier, U.S. Dept. of Commerce, Washington DC ABSTRACT When two or more SAS datasets are combined by means of a SET statement and an accompanying

More information

Intellicus Enterprise Reporting and BI Platform

Intellicus Enterprise Reporting and BI Platform Designing Adhoc Reports Intellicus Enterprise Reporting and BI Platform Intellicus Technologies info@intellicus.com www.intellicus.com Designing Adhoc Reports i Copyright 2012 Intellicus Technologies This

More information

Data movement issues: Explicit SQL Pass-Through can do the trick

Data movement issues: Explicit SQL Pass-Through can do the trick SESUG Paper DM-57-2017 Data movement issues: Explicit SQL Pass-Through can do the trick Kiran Venna, Dataspace Inc. ABSTRACT Data movement between Teradata and SAS will have huge impact on run time of

More information

CMP-3440 Database Systems

CMP-3440 Database Systems CMP-3440 Database Systems Advanced SQL Lecture 07 zain 1 Select Statement - Aggregates ISO standard defines five aggregate functions: COUNT returns number of values in specified column. SUM returns sum

More information

Simple Rules to Remember When Working with Indexes

Simple Rules to Remember When Working with Indexes Simple Rules to Remember When Working with Indexes Kirk Paul Lafler, Software Intelligence Corporation, Spring Valley, CA Abstract SAS users are always interested in learning techniques related to improving

More information

INTERMEDIATE SQL GOING BEYOND THE SELECT. Created by Brian Duffey

INTERMEDIATE SQL GOING BEYOND THE SELECT. Created by Brian Duffey INTERMEDIATE SQL GOING BEYOND THE SELECT Created by Brian Duffey WHO I AM Brian Duffey 3 years consultant at michaels, ross, and cole 9+ years SQL user What have I used SQL for? ROADMAP Introduction 1.

More information

Chapter-14 SQL COMMANDS

Chapter-14 SQL COMMANDS Chapter-14 SQL COMMANDS What is SQL? Structured Query Language and it helps to make practice on SQL commands which provides immediate results. SQL is Structured Query Language, which is a computer language

More information

The Basics of PROC FCMP. Dachao Liu Northwestern Universtiy Chicago

The Basics of PROC FCMP. Dachao Liu Northwestern Universtiy Chicago The Basics of PROC FCMP Dachao Liu Northwestern Universtiy Chicago ABSTRACT SAS Functions can save SAS users time and effort in programming. Each release of SAS has new functions added. Up to the latest

More information

Hypothesis Testing: An SQL Analogy

Hypothesis Testing: An SQL Analogy Hypothesis Testing: An SQL Analogy Leroy Bracken, Boulder Creek, CA Paul D Sherman, San Jose, CA ABSTRACT This paper is all about missing data. Do you ever know something about someone but don't know who

More information

Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA

Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA Indexing and Compressing SAS Data Sets: How, Why, and Why Not Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA Many users of SAS System software, especially those working

More information

MTA Database Administrator Fundamentals Course

MTA Database Administrator Fundamentals Course MTA Database Administrator Fundamentals Course Session 1 Section A: Database Tables Tables Representing Data with Tables SQL Server Management Studio Section B: Database Relationships Flat File Databases

More information

Lecture 06. Fall 2018 Borough of Manhattan Community College

Lecture 06. Fall 2018 Borough of Manhattan Community College Lecture 06 Fall 2018 Borough of Manhattan Community College 1 Introduction to SQL Over the last few years, Structured Query Language (SQL) has become the standard relational database language. More than

More information

Lecture 3 SQL. Shuigeng Zhou. September 23, 2008 School of Computer Science Fudan University

Lecture 3 SQL. Shuigeng Zhou. September 23, 2008 School of Computer Science Fudan University Lecture 3 SQL Shuigeng Zhou September 23, 2008 School of Computer Science Fudan University Outline Basic Structure Set Operations Aggregate Functions Null Values Nested Subqueries Derived Relations Views

More information

Teradata SQL Features Overview Version

Teradata SQL Features Overview Version Table of Contents Teradata SQL Features Overview Version 14.10.0 Module 0 - Introduction Course Objectives... 0-4 Course Description... 0-6 Course Content... 0-8 Module 1 - Teradata Studio Features Optimize

More information

INDEX. 1 Basic SQL Statements. 2 Restricting and Sorting Data. 3 Single Row Functions. 4 Displaying data from multiple tables

INDEX. 1 Basic SQL Statements. 2 Restricting and Sorting Data. 3 Single Row Functions. 4 Displaying data from multiple tables INDEX Exercise No Title 1 Basic SQL Statements 2 Restricting and Sorting Data 3 Single Row Functions 4 Displaying data from multiple tables 5 Creating and Managing Tables 6 Including Constraints 7 Manipulating

More information

Unit Assessment Guide

Unit Assessment Guide Unit Assessment Guide Unit Details Unit code Unit name Unit purpose/application ICTWEB425 Apply structured query language to extract and manipulate data This unit describes the skills and knowledge required

More information

ODS in an Instant! Bernadette H. Johnson, The Blaze Group, Inc., Raleigh, NC

ODS in an Instant! Bernadette H. Johnson, The Blaze Group, Inc., Raleigh, NC Paper 210-28 ODS in an Instant! Bernadette H. Johnson, The Blaze Group, Inc., Raleigh, NC ABSTRACT Do you need to generate high impact word processor, printer- or web- ready output? Want to skip the SAS

More information

Exploring DATA Step Merge and PROC SQL Join Techniques Kirk Paul Lafler, Software Intelligence Corporation, Spring Valley, California

Exploring DATA Step Merge and PROC SQL Join Techniques Kirk Paul Lafler, Software Intelligence Corporation, Spring Valley, California Exploring DATA Step Merge and PROC SQL Join Techniques Kirk Paul Lafler, Software Intelligence Corporation, Spring Valley, California Abstract Explore the various DATA step merge and PROC SQL join processes.

More information

Programming Gems that are worth learning SQL for! Pamela L. Reading, Rho, Inc., Chapel Hill, NC

Programming Gems that are worth learning SQL for! Pamela L. Reading, Rho, Inc., Chapel Hill, NC Paper CC-05 Programming Gems that are worth learning SQL for! Pamela L. Reading, Rho, Inc., Chapel Hill, NC ABSTRACT For many SAS users, learning SQL syntax appears to be a significant effort with a low

More information

Relational Database Management Systems for Epidemiologists: SQL Part II

Relational Database Management Systems for Epidemiologists: SQL Part II Relational Database Management Systems for Epidemiologists: SQL Part II Outline Summarizing and Grouping Data Retrieving Data from Multiple Tables using JOINS Summary of Aggregate Functions Function MIN

More information

Techdata Solution. SAS Analytics (Clinical/Finance/Banking)

Techdata Solution. SAS Analytics (Clinical/Finance/Banking) +91-9702066624 Techdata Solution Training - Staffing - Consulting Mumbai & Pune SAS Analytics (Clinical/Finance/Banking) What is SAS SAS (pronounced "sass", originally Statistical Analysis System) is an

More information

David Beam, Systems Seminar Consultants, Inc., Madison, WI

David Beam, Systems Seminar Consultants, Inc., Madison, WI Paper 150-26 INTRODUCTION TO PROC SQL David Beam, Systems Seminar Consultants, Inc., Madison, WI ABSTRACT PROC SQL is a powerful Base SAS Procedure that combines the functionality of DATA and PROC steps

More information

Oracle Database: SQL and PL/SQL Fundamentals NEW

Oracle Database: SQL and PL/SQL Fundamentals NEW Oracle Database: SQL and PL/SQL Fundamentals NEW Duration: 5 Days What you will learn This Oracle Database: SQL and PL/SQL Fundamentals training delivers the fundamentals of SQL and PL/SQL along with the

More information

ASSIGNMENT NO Computer System with Open Source Operating System. 2. Mysql

ASSIGNMENT NO Computer System with Open Source Operating System. 2. Mysql ASSIGNMENT NO. 3 Title: Design at least 10 SQL queries for suitable database application using SQL DML statements: Insert, Select, Update, Delete with operators, functions, and set operator. Requirements:

More information

Proc SQL A Primer for SAS Programmers

Proc SQL A Primer for SAS Programmers Proc SQL A Primer for SAS Programmers Jimmy DeFoor South Central SAS Users Group Benbrook, Texas Abstract The Structured Query Language (SQL) has a very different syntax and, often, a very different method

More information

SQL Queries. for. Mere Mortals. Third Edition. A Hands-On Guide to Data Manipulation in SQL. John L. Viescas Michael J. Hernandez

SQL Queries. for. Mere Mortals. Third Edition. A Hands-On Guide to Data Manipulation in SQL. John L. Viescas Michael J. Hernandez SQL Queries for Mere Mortals Third Edition A Hands-On Guide to Data Manipulation in SQL John L. Viescas Michael J. Hernandez r A TT TAddison-Wesley Upper Saddle River, NJ Boston Indianapolis San Francisco

More information

How to Create Data-Driven Lists

How to Create Data-Driven Lists Paper 9540-2016 How to Create Data-Driven Lists Kate Burnett-Isaacs, Statistics Canada ABSTRACT As SAS programmers we often want our code or program logic to be driven by the data at hand, rather than

More information

Microsoft Access XP (2002) - Advanced Queries

Microsoft Access XP (2002) - Advanced Queries Microsoft Access XP (2002) - Advanced Queries Group/Summary Operations Change Join Properties Not Equal Query Parameter Queries Working with Text IIF Queries Expression Builder Backing up Tables Action

More information

Lesson 2. Data Manipulation Language

Lesson 2. Data Manipulation Language Lesson 2 Data Manipulation Language IN THIS LESSON YOU WILL LEARN To add data to the database. To remove data. To update existing data. To retrieve the information from the database that fulfil the stablished

More information

Oracle Database: SQL and PL/SQL Fundamentals Ed 2

Oracle Database: SQL and PL/SQL Fundamentals Ed 2 Oracle University Contact Us: Local: 1800 103 4775 Intl: +91 80 67863102 Oracle Database: SQL and PL/SQL Fundamentals Ed 2 Duration: 5 Days What you will learn This Oracle Database: SQL and PL/SQL Fundamentals

More information

Paper PS05_05 Using SAS to Process Repeated Measures Data Terry Fain, RAND Corporation Cyndie Gareleck, RAND Corporation

Paper PS05_05 Using SAS to Process Repeated Measures Data Terry Fain, RAND Corporation Cyndie Gareleck, RAND Corporation Paper PS05_05 Using SAS to Process Repeated Measures Data Terry Fain, RAND Corporation Cyndie Gareleck, RAND Corporation ABSTRACT Data that contain multiple observations per case are called repeated measures

More information

SAS 9 Programming Enhancements Marje Fecht, Prowerk Consulting Ltd Mississauga, Ontario, Canada

SAS 9 Programming Enhancements Marje Fecht, Prowerk Consulting Ltd Mississauga, Ontario, Canada SAS 9 Programming Enhancements Marje Fecht, Prowerk Consulting Ltd Mississauga, Ontario, Canada ABSTRACT Performance improvements are the well-publicized enhancement to SAS 9, but what else has changed

More information

Using Data Set Options in PROC SQL Kenneth W. Borowiak Howard M. Proskin & Associates, Inc., Rochester, NY

Using Data Set Options in PROC SQL Kenneth W. Borowiak Howard M. Proskin & Associates, Inc., Rochester, NY Using Data Set Options in PROC SQL Kenneth W. Borowiak Howard M. Proskin & Associates, Inc., Rochester, NY ABSTRACT Data set options are an often over-looked feature when querying and manipulating SAS

More information

Tales from the Help Desk 6: Solutions to Common SAS Tasks

Tales from the Help Desk 6: Solutions to Common SAS Tasks SESUG 2015 ABSTRACT Paper BB-72 Tales from the Help Desk 6: Solutions to Common SAS Tasks Bruce Gilsen, Federal Reserve Board, Washington, DC In 30 years as a SAS consultant at the Federal Reserve Board,

More information

Base and Advance SAS

Base and Advance SAS Base and Advance SAS BASE SAS INTRODUCTION An Overview of the SAS System SAS Tasks Output produced by the SAS System SAS Tools (SAS Program - Data step and Proc step) A sample SAS program Exploring SAS

More information

SAS Certification Handout #8: Adv. Prog. Ch. 1-2

SAS Certification Handout #8: Adv. Prog. Ch. 1-2 /* First, make example data SAS Certification Handout #8: Adv. Prog. Ch. 1-2 libname cert 'C:/jrstevens/Teaching/SAS_Cert/AdvNotes' /* In SAS Studio, after creating SAS_Cert folder with username jrstevens:

More information

Chapter 7. Introduction to Structured Query Language (SQL) Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel

Chapter 7. Introduction to Structured Query Language (SQL) Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel Chapter 7 Introduction to Structured Query Language (SQL) Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel 1 In this chapter, you will learn: The basic commands

More information

SAS Online Training: Course contents: Agenda:

SAS Online Training: Course contents: Agenda: SAS Online Training: Course contents: Agenda: (1) Base SAS (6) Clinical SAS Online Training with Real time Projects (2) Advance SAS (7) Financial SAS Training Real time Projects (3) SQL (8) CV preparation

More information

Square Peg, Square Hole Getting Tables to Fit on Slides in the ODS Destination for PowerPoint

Square Peg, Square Hole Getting Tables to Fit on Slides in the ODS Destination for PowerPoint PharmaSUG 2018 - Paper DV-01 Square Peg, Square Hole Getting Tables to Fit on Slides in the ODS Destination for PowerPoint Jane Eslinger, SAS Institute Inc. ABSTRACT An output table is a square. A slide

More information

STIDistrict Query (Basic)

STIDistrict Query (Basic) STIDistrict Query (Basic) Creating a Basic Query To create a basic query in the Query Builder, open the STIDistrict workstation and click on Utilities Query Builder. When the program opens, database objects

More information

Introducing a Colorful Proc Tabulate Ben Cochran, The Bedford Group, Raleigh, NC

Introducing a Colorful Proc Tabulate Ben Cochran, The Bedford Group, Raleigh, NC Paper S1-09-2013 Introducing a Colorful Proc Tabulate Ben Cochran, The Bedford Group, Raleigh, NC ABSTRACT Several years ago, one of my clients was in the business of selling reports to hospitals. He used

More information

Guide Users along Information Pathways and Surf through the Data

Guide Users along Information Pathways and Surf through the Data Guide Users along Information Pathways and Surf through the Data Stephen Overton, Overton Technologies, LLC, Raleigh, NC ABSTRACT Business information can be consumed many ways using the SAS Enterprise

More information

How to Look Up People Using LDAP in Eudora

How to Look Up People Using LDAP in Eudora How to Look Up People Using LDAP in Eudora Introduction Eudora lets you look up individuals on the Internet and within your company using several Directory Services protocols. Each of these protocols is

More information

Introduction / Overview

Introduction / Overview Paper # SC18 Exploring SAS Generation Data Sets Kirk Paul Lafler, Software Intelligence Corporation Abstract Users have at their disposal a unique and powerful feature for retaining historical copies of

More information

2) SQL includes a data definition language, a data manipulation language, and SQL/Persistent stored modules. Answer: TRUE Diff: 2 Page Ref: 36

2) SQL includes a data definition language, a data manipulation language, and SQL/Persistent stored modules. Answer: TRUE Diff: 2 Page Ref: 36 Database Processing, 12e (Kroenke/Auer) Chapter 2: Introduction to Structured Query Language (SQL) 1) SQL stands for Standard Query Language. Diff: 1 Page Ref: 32 2) SQL includes a data definition language,

More information

Lab # 6. Using Subqueries and Set Operators. Eng. Alaa O Shama

Lab # 6. Using Subqueries and Set Operators. Eng. Alaa O Shama The Islamic University of Gaza Faculty of Engineering Department of Computer Engineering ECOM 4113: Database Lab Lab # 6 Using Subqueries and Set Operators Eng. Alaa O Shama November, 2015 Objectives:

More information