Why choose between SAS Data Step and PROC SQL when you can have both?

Similar documents
Top 5 Handy PROC SQL Tips You Didn t Think Were Possible

Same Data Different Attributes: Cloning Issues with Data Sets Brian Varney, Experis Business Analytics, Portage, MI

Know Thy Data : Techniques for Data Exploration

Better Metadata Through SAS II: %SYSFUNC, PROC DATASETS, and Dictionary Tables

SQL Metadata Applications: I Hate Typing

Uncommon Techniques for Common Variables

How to Create Data-Driven Lists

Ready To Become Really Productive Using PROC SQL? Sunil K. Gupta, Gupta Programming, Simi Valley, CA

INTERMEDIATE SQL GOING BEYOND THE SELECT. Created by Brian Duffey

PREREQUISITES FOR EXAMPLES

OLAP Drill-through Table Considerations

Base and Advance SAS

Guide Users along Information Pathways and Surf through the Data

The Power of Combining Data with the PROC SQL

Creating and Executing Stored Compiled DATA Step Programs

WHAT ARE SASHELP VIEWS?

An SQL Tutorial Some Random Tips

Greenspace: A Macro to Improve a SAS Data Set Footprint

Taming a Spreadsheet Importation Monster

Stat Wk 3. Stat 342 Notes. Week 3, Page 1 / 71

IF there is a Better Way than IF-THEN

T-SQL Training: T-SQL for SQL Server for Developers

Oracle Database 10g: Introduction to SQL

Paper S Data Presentation 101: An Analyst s Perspective

SAS coding for those who like to be control

Unit Assessment Guide

PROC SQL vs. DATA Step Processing. T Winand, Customer Success Technical Team

SAS File Management. Improving Performance CHAPTER 37

Create Metadata Documentation using ExcelXP

Cleaning Duplicate Observations on a Chessboard of Missing Values Mayrita Vitvitska, ClinOps, LLC, San Francisco, CA

INTRODUCTION TO PROC SQL JEFF SIMPSON SYSTEMS ENGINEER

INTRODUCTION TO SAS HOW SAS WORKS READING RAW DATA INTO SAS

Querying Data with Transact-SQL

Planting Your Rows: Using SAS Formats to Make the Generation of Zero- Filled Rows in Tables Less Thorny

JUST PASSING THROUGH OR ARE YOU? DETERMINE WHEN SQL PASS THROUGH OCCURS TO OPTIMIZE YOUR QUERIES Misty Johnson Wisconsin Department of Health

Querying Data with Transact SQL

The Power of PROC SQL Techniques and SAS Dictionary Tables in Handling Data

Compare Two Identical Tables Data In Different Oracle Databases

If You Need These OBS and These VARS, Then Drop IF, and Keep WHERE Jay Iyengar, Data Systems Consultants LLC

Arena: Membership Lists Foundations (Hands On)

Introduction to SAS. Cristina Murray-Krezan Research Assistant Professor of Internal Medicine Biostatistician, CTSC

I KNOW HOW TO PROGRAM IN SAS HOW DO I NAVIGATE SAS ENTERPRISE GUIDE?

Not Just Merge - Complex Derivation Made Easy by Hash Object

WEEK 3 TERADATA EXERCISES GUIDE

%MISSING: A SAS Macro to Report Missing Value Percentages for a Multi-Year Multi-File Information System

Chapter 6: Modifying and Combining Data Sets

ERROR: The following columns were not found in the contributing table: vacation_allowed

Demystifying PROC SQL Join Algorithms

A Better Perspective of SASHELP Views

Ditch the Data Memo: Using Macro Variables and Outer Union Corresponding in PROC SQL to Create Data Set Summary Tables Andrea Shane MDRC, Oakland, CA

Dictionary.coumns is your friend while appending or moving data

SQL Let s Join Together

BASICS BEFORE STARTING SAS DATAWAREHOSING Concepts What is ETL ETL Concepts What is OLAP SAS. What is SAS History of SAS Modules available SAS

Querying Data with Transact SQL Microsoft Official Curriculum (MOC 20761)

Tips & Tricks. With lots of help from other SUG and SUGI presenters. SAS HUG Meeting, November 18, 2010

Tackling Unique Problems Using TWO SET Statements in ONE DATA Step. Ben Cochran, The Bedford Group, Raleigh, NC

Aster Data Basics Class Outline

The GEOCODE Procedure and SAS Visual Analytics

Contents of SAS Programming Techniques

PharmaSUG Paper TT11

Using Proc Freq for Manageable Data Summarization

Checking for Duplicates Wendi L. Wright

Using Data Set Options in PROC SQL Kenneth W. Borowiak Howard M. Proskin & Associates, Inc., Rochester, NY

Aster Data SQL and MapReduce Class Outline

The Proc Transpose Cookbook

100 THE NUANCES OF COMBINING MULTIPLE HOSPITAL DATA

MTA Database Administrator Fundamentals Course

20761 Querying Data with Transact SQL

SAS BI Dashboard 3.1. User s Guide Second Edition

Getting Up to Speed with PROC REPORT Kimberly LeBouton, K.J.L. Computing, Rossmoor, CA

BAKING OR COOKING LEARN THE DIFFERENCE BETWEEN SAS ARRAYS AND HASH OBJECTS

PharmaSUG China Paper 70

Exploring DATA Step Merge and PROC SQL Join Techniques Kirk Paul Lafler, Software Intelligence Corporation, Spring Valley, California

PROC CATALOG, the Wish Book SAS Procedure Louise Hadden, Abt Associates Inc., Cambridge, MA

Ten Great Reasons to Learn SAS Software's SQL Procedure

David Ghan SAS Education

SAS Viya 3.1 FAQ for Processing UTF-8 Data

Omitting Records with Invalid Default Values

Run your reports through that last loop to standardize the presentation attributes

SAS System Powers Web Measurement Solution at U S WEST

DB2 SQL Class Outline

Easing into Data Exploration, Reporting, and Analytics Using SAS Enterprise Guide

ABSTRACT. Paper

Conditional Processing in the SAS Software by Example

Principles of Data Management

STIDistrict Query (Basic)

SAS CURRICULUM. BASE SAS Introduction

Why Hash? Glen Becker, USAA

CSC Web Programming. Introduction to SQL

An Easy Way to Split a SAS Data Set into Unique and Non-Unique Row Subsets Thomas E. Billings, MUFG Union Bank, N.A., San Francisco, California

SAS. Information Map Studio 3.1: Creating Your First Information Map

Hidden in plain sight: my top ten underpublicized enhancements in SAS Versions 9.2 and 9.3

SAS Certification Handout #9: Adv. Prog. Ch. 3-4

Data Edit-checks Integration using ODS Tagset Niraj J. Pandya, Element Technologies Inc., NJ Vinodh Paida, Impressive Systems Inc.

The NESTED Procedure (Chapter)

A Methodology for Truly Dynamic Prompting in SAS Stored Processes

PharmaSUG China 2018 Paper AD-62

Data movement issues: Explicit SQL Pass-Through can do the trick

Exploring DICTIONARY Tables and SASHELP Views

Using PROC SQL to Calculate FIRSTOBS David C. Tabano, Kaiser Permanente, Denver, CO

Transcription:

Paper QT-09 Why choose between SAS Data Step and PROC SQL when you can have both? Charu Shankar, SAS Canada ABSTRACT As a SAS coder, you've often wondered what the SQL buzz is about. Or vice versa you breathe SQL & don't have time for SAS. Learn where the SAS data step has a distinct advantage over SQL. Learn where you just can't beat SQL. 1. READING RAW DATA The data step is able to read raw data. Since PROC SQL does not read raw data, turn to the data step to read your flat files. data dsrawdata; infile datalines dlm=','; input name $ gender $ age height; datalines; Alfred,M,14,69 Alice,F,13,56.5 Barbara,F,13,65.3 ; Display 1. Read raw data with the data step WINNER: SAS DATA STEP 2. JOINING DATA The SAS data step uses Merging techniques to join tables while PROC SQL uses join algorithms. PROC SQL offers more flexibility in joins: you don t necessarily have to join on same named columns, nor are you limited to joining only on equality, nor do you have to explicitly pre-sort data. Data Step Merge * prepare data for merging; proc sort data=sashelp.prdsal2 out=prdsal2; by state; proc sort data=sashelp.us_data out=us_data(drop=state); by statename; Proc SQL Join create table sqljoin as select COUNTRY,COUNTY,PRODUCT,STATENAME,POPULATION_2010 from sashelp.prdsal2 as p, sashelp.us_data as us where p.state = us.statename; 1

*rename variables to allow merging; data dsmerge; merge prdsal2(in=inprd) us_data(in=inus rename=(statename=state)); by state; if inprd and inus; keep COUNTRY COUNTY PRODUCT STATE POPULATION_2010; Display 2. Data Step Merge vs. PROC SQL Join Match-Merge SQL Inner Join Number of datasets/ size No limit to number or size other than disk space. Max number of tables in join 256. Data processing sequential so that observations with duplicate BY values are joined one-to-one. Cartesian product for duplicate BY values. Output datasets Multiple data sets can be created. Only one data set can be created Complex business logic using IF-THEN or SELECT/WHEN logic. CASE logic; however, not as flexible as DATA step syntax. Sorted/indexed datasets Prerequisite for merging Not necessary Join condition Equality only Inequal joins can be performed. Same named variables Same named BY variables must be available in all data sets. Same named variables do not have to be in all data sets. Display 3. Data Step Merge vs. PROC SQL Join comparison WINNER PROC SQL 3. ACCUMULATING DATA The data step can get an accumulating or running total with greater ease by using a SUM statement. The PROC SQL step is a little bit more involved. A running total is the summation of a sequence of numbers which is updated each time a new number is added to the sequence, by adding the value of the new number to the previous running total. 2

Proc sql accumulating data data shoes; set sashelp.shoes; obs=_n_; create table sqlrunning as select region, product, sales, (select sum(a.sales) from shoes as a where a.obs <= b.obs) as Running_total from shoes as b; Display 4. Accumulating data WINNER: DATA STEP Data step accumulating data data dsrunning; set shoes; keep region product sales running_total; running_total + sales; 4. AGGREGATING DATA The Boolean expression is a logical expression that evaluates to either true or false. With PROC SQL you can get really creative in what this expression looks like. Just like adults, newborns come in a range of healthy sizes. Most babies born between 37 and 40 weeks weigh somewhere between 5 pounds, 8 ounces (2,500 grams) and 8 pounds, 13 ounces (4,000 grams). If you want to see the number of over average weight babies who were born to married women and whose mom were smokers, use the slick Boolean operation in Proc Sql. Proc Sql aggregating data Data step aggregating data /*1. prep data for summarizing*/ create table sqlboolean as data dsboolean; select visit, set bweight; sum(weight > 4000 and married=1 by visit; and momsmoke=1) as wgt4000 'over if first.visit then do; average weight', wgt4000=0; sum(weight <=2500 and married=1 wle2500=0; and momsmoke=1) as wle2500 'under end; average weight' if weight > 4000 and married=1 and from sashelp.bweight momsmoke=1 then wgt4000 + 1; group by visit; else if weight <=2500 and married=1 and momsmoke=1 then wle2500 + 1; if last.visit; label wgt4000 ='over average weight' wle2500 ='under average weight'; keep visit wgt4000 wle2500; Display 5. Aggregating data WINNER: PROC SQL 5. MANAGING DATA Dictionary tables are a fascinating way to get metadata quickly. If you are trying to locate ID named columns that are also numeric, PROC SQL will do the trick speedily. 3

select libname, memname, name, type, length from dictionary.columns where upcase(name) contains 'ID' and libname='sashelp' and type='num'; NOTE: Table WORK.SQLDICT created, with 34 rows and 5 columns. NOTE: PROCEDURE SQL used (Total process time): real time 0.77 seconds user cpu time 0.37 seconds system cpu time 0.34 seconds memory 5623.92k OS Memory 29176.00k Timestamp 03/24/2018 12:38:22 AM Display 6. PROC SQL to locate all numeric ID columns in the SASHELP library data dsdict; set sashelp.vcolumn; keep libname memname name type length; where upcase(name) contains 'ID' and libname='sashelp' and type='num'; NOTE: There were 34 observations read from the data set SASHELP.VCOLUMN. WHERE UPCASE(name) contains 'ID' and (libname='sashelp') and (type='num'); NOTE: The data set WORK.DSDICT has 34 observations and 5 variables. NOTE: DATA statement used (Total process time): real time 2.86 seconds user cpu time 1.26 seconds system cpu time 1.42 seconds memory 6505.20k OS Memory 29432.00k Display 6. Data step to locate all numeric ID columns in the SASHELP library Here s why PROC SQL was faster. While querying a DICTIONARY table, SAS launches a discovery process. Depending on the DICTIONARY table being queried, this discovery process can search libraries, open tables, and execute views. The PROC SQL step runs much faster than other SAS procedures and the DATA step. This is because PROC SQL can optimize the query before the discovery process is launched. The WHERE clause is processed before the tables referenced by the SASHELP.VCOLUMN view are opened. WINNER: PROC SQL CONCLUSION Both the SAS data step and PROC SQL are powerful tools to read, analyze, manage and report on data. The goal of this paper was to shine light on 5 common data scenarios and showcase how sometimes the data step is crisper and more efficient and sometimes PROC SQL rises above the data step. Knowing the strengths of each of these tools in your SAS toolkit will help draw upon the right tool at the right time. 4

REFERENCES Huang, Chao Top 10 SQL trips in SAS, SAS Global Forum 2014 https://support.sas.com/resources/papers/proceedings14/1561-2014.pdf Shankar, Charu, #1 SAS programming tip for 2012, SAS Training Post, May 10, 2012 https://blogs.sas.com/content/sastraining/2012/05/10/1-sas-programming-tip-for-2012/ ACKNOWLEDGMENTS Charu is grateful to Pharmasug for accepting her paper. This paper originally was presented at the Wisconsin Illinois SAS Users group as an invited paper. Charu has modified this paper for Pharmasug and included up to date references. She appreciates her manager Stephen Keelan and SAS Canada for the support and encouragement to share her SAS and SQL knowledge. She is grateful to her many wonderful customers and students whose ongoing questions provided the impetus to research & share SAS data step and PROC SQL techniques. CONTACT INFORMATION The author welcomes correspondence about this work. You can contact her at: Charu Shankar Senior Technical Training Specialist SAS Institute Inc. 280 King Street East, Toronto, ON M5A 1K7 Charu.Shankar@sas.com SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. Why 5