Using PROC SQL to Calculate FIRSTOBS David C. Tabano, Kaiser Permanente, Denver, CO

Size: px
Start display at page:

Download "Using PROC SQL to Calculate FIRSTOBS David C. Tabano, Kaiser Permanente, Denver, CO"

Transcription

1 Using PROC SQL to Calculate FIRSTOBS David C. Tabano, Kaiser Permanente, Denver, CO ABSTRACT The power of SAS programming can at times be greatly improved using PROC SQL statements for formatting and manipulating data, and there is considerable literature available on comparing the strengths of using PROC SQL over the DATA step. This paper expands on such programming comparisons by reviewing the use of the DATA step and PROC SQL to select first observation(s) of a particular incident with a specified date range across repeated measures in datasets. INTRODUCTION Data processing and manipulation are important steps in any query process. Building datasets that effectively describe data and prepare it for more robust analysis require many steps, and programmers are always looking for more efficient ways to perform such tasks. SAS has several power tools for such manipulation, so many so that the choices can be overwhelming. The SAS DATA step is often the baseline of any dataset creation or manipulation as it is robust in flexibility and processing efficiency. The DATA step allows for considerable functionality, particularly in iterative processing within a dataset and through observations based on defined conditional statements. In addition to the DATA step, SAS has incorporated Structured Query Language (SQL) into Base SAS, Versions 6 and above. SQL has grown across multiple software platforms as the standard in data manipulation, particularly in processing queries on data stored within relational databases. In SAS, the SQL procedure (very similar to SQL language found in other data software platforms) acts as both a substitute and a compliment to the standard DATA step. PROC SQL has many advantages over the traditional DATA step, both in coding efficiency and descriptive statistics. One of the main advantages of PROC SQL is the ability to merge, join and rearrange variables in a dataset without needing to first sort the dataset, a considerable time-saver for programmers working with very large datasets. REVIEW OF PROC SQL VERSUS DATA STEP This paper assumes a familiar background with PROC SQL and DATA step procedures in SAS, but it is important to reiterate a few key points in comparing the two. Many of the references listed at the end of this paper provide more detailed comparisons of PROC SQL and the DATA step. SQL is a programming language designed to query data in relational databases. PROC SQL was implemented in SAS to give programmers a tool for working with relational databases in a similar fashion. PROC SQL can create SAS datasets from relational data sources, effectively bridging the gap between complex relational databases and traditional dataset flat files. The basic syntax of PROC SQL is similar to the DATA step syntax but based on the elements of a relational database. The following table lists the differences in syntax between SAS and SQL programming languages: SAS SQL Datasets Table Observations Rows Variables Columns Merge Join Extract Query 1 Broadly, PROC SQL can perform many of the same tasks as several other SAS procedures. One of the advantages to PROC SQL is the relative uniformity of the code with respect to generating different results. For example, creating and viewing a table in SAS normally requires at least two steps- a DATA step to create the dataset and a print procedure to view the observations and variables. DATA TEST_A; SET WORK.DATASOURCE; PROC PRINT DATA=TEST_A; 1 Dickstein, Craig and Pass, Ray, DATA Step vs. PROC SQL: What s A Neophyte To Do?, Proceedings from the Twenty-Ninth Annual SAS Users Group International Conference, paper 296,

2 Using PROC SQL, the same result can be generated with one statement and without the need to create the dataset (or table, in this case). PROC SQL; SELECT * FROM WORK.DATASOURCE; QUIT; Similarly, merging data between two datasets using the DATA step necessitate a preliminary sort procedure to ensure both datasets are organized in the same way for the MERGE statement to operate. The MERGE statement to the left is an example of a simple merge between two tables based on a single similar element; the merge to the right performs a conditional merge of dataset TEST_B to TEST_A, pulling in all observations from TEST_A only joining the observations in TEST_B to TEST_A where both variables match. PROC SORT DATA=TEST_A; PROC SORT DATA=TEST_B; DATA TEST_C; MERGE TEST_A TEST_B; PROC SORT DATA=TEST_A; PROC SORT DATA=TEST_B; DATA TEST_C; MERGE TEST_A(IN=A) TEST_B (IN=B); IF (A=1); PROC SQL does not require different tables to be sorted the same way (or in any particular way) prior to a merge. If both sources contain the same column in the same format, a merger of the two tables can be performed in one step. The code to the left reflects a simple join between the two tables based on a single similar element; the code to the right performs conditional OUTER JOIN where all the data from TEST_A are retained and only the data matching the VAR element is added from TEST_B. PROC SQL; PROC SQL; CREATE TABLE TEST_C AS CREATE TABLE TEST_C AS SELECT * SELECT * FROM TEST_A A, TEST_B B FROM TEST_A A WHERE A.VAR = B.VAR; LEFT OUTER JOIN TEST_B B ON B.VAR = A.VAR; QUIT; QUIT; Depending on the objective of the programmer, the structure of the data source(s), and the desired query results, using both PROC SQL and the DATA step together can afford a considerable amount of flexibility and efficiency in manipulating data across various sources. PROC SQL does contain certain limitations from the DATA step. For example, PROC SQL cannot perform iterative processing on its own. In PROC SQL, datasets are processed relationally, based on columns (variables) in the dataset, whereas the DATA step works with the observations within the dataset itself on a row-by-row basis. For analysis requiring processing repeated measures of data, programming using the DATA step can be accomplish iterative processing of the dataset with one of several options (FIRST., LAST., RETAIN, OUTPUT, etc.). For PROC SQL, the answer is more nuanced than a straightforward option command, but not impossible with proper understanding of joins, nested queries and grouping of tables. The next sections compare the use the DATA step and PROC SQL for processing repeated measures in a dataset. REPEATED MEASURES AND DATA Repeated measures in data are part of categorical variables with multiple observations based on other criteria. Cross-sectional and longitudinal data are two examples of repeated measures. One of the most common ways repeated measures are displaying in longitudinal data is across categorical variables involving multivariate time 2

3 series data, or data containing different events with multiple overlapping characteristics. Consider medical data, where a group of patients whose visits to their provider occur over a period of several years. A database of longitudinal data could be maintained, containing all of the patients vital statistics 2. A second dataset contains demographic information on the patient, such as birth date, gender, etc. The following SAS code and output give an example of a dataset with repeated measures of Body-Mass-index (BMI) and age over time. /* Build Dataset by merging datasets with variables of interest */ data vital (keep= patid measure_dt birth_date age bmi gender); retain patid age gender measure_dt bmi; merge vitals demographics; age = round((measure_dt-birth_date)/365.25,1); by patid; NOTE: There were 9833 observations read from the data set WORK.VITALS. NOTE: There were 1000 observations read from the data set WORK.DEMOGRAPHICS. NOTE: The data set WORK.VITAL has 9833 observations and 6 variables. NOTE: DATA statement used (Total process time): proc print data=vital (firstobs=1 obs=10); /* Prints out the first ten observations in dataset */ LOG NOTE: There were 10 observations read from the data set WORK.VITAL. NOTE: PROCEDURE PRINT used (Total process time): OUTPUT The SAS System Obs patid age gender measure_dt bmi birth_date M 11/28/ /11/ M 10/12/ /20/ M 01/15/ /20/ M 12/18/ /20/ M 08/04/ /20/ M 07/29/ /20/ M 06/26/ /20/ M 04/02/ /20/ M 04/06/ /20/ M 04/21/ /20/1953 As the output above shows, patients are outlined in the dataset by PATID (patient ID), MEASURE_DT (date of vitals measurement), and BMI measurement corresponding with the MEASURE_DT. Patient identifiers (PATID, AGE, BIRTH_DATE, and GENDER) repeat across the multiple vitals records for each patient. COHORT DESIGN Programmers working with datasets structured with repeated measures often need to subset these datasets based on particular criteria. In epidemiological research, building cohorts of patient populations based on the first instance 2 Dates and patient IDs have been offset to ensure sample data presented in this paper contains no PHI and is completely de-identified. 3

4 of a condition within a time period is typical first step before more sophisticated analysis can take place. Suppose a researcher wanted to build a cohort of patients who s first recorded BMI >= 35 occurred between January 1, 2005 and December 31, Using the DATA step, this can be accomplished in three simple steps. First, a dataset containing the cohort criteria is built, then it is sorted by the identifier desired for the first instance of cohort criteria (in this case PATID), and finally the first PATID/BMI record within the observation is taken using the FIRST.PATID option. The SAS below annotates the steps. /* Step 1- Define cohort criteria- date range, BMI threshold, age threshold */ DATA step1; set vital; if '01jan05'd le measure_dt le '31dec06'd and bmi ge 35; /* Step 2- Sort Data by PATID and MEASURE_DT to order rows for FIRST. option in Step3 */ proc sort data=step1 out=step2; by patid measure_dt; /* Step 3- Identify first instance of cohort criteria */ data bmi_firstobs_data; set step2; by patid; if first.patid; proc sort data=bmi_firstobs_data; by measure_dt patid; proc print data=bmi_firstobs_data (firstobs=1 obs=10); LOG NOTE: There were 9833 observations read from the data set WORK.VITAL. NOTE: The data set WORK.STEP1 has 540 observations and 6 variables. NOTE: DATA statement used (Total process time): NOTE: There were 540 observations read from the data set WORK.STEP1. NOTE: The data set WORK.STEP2 has 540 observations and 6 variables. NOTE: PROCEDURE SORT used (Total process time): NOTE: There were 540 observations read from the data set WORK.STEP2. NOTE: The data set WORK.BMI_FIRSTOBS_DATA has 106 observations and 6 variables. NOTE: DATA statement used (Total process time): NOTE: There were 106 observations read from the data set WORK.BMI_FIRSTOBS_DATA. NOTE: The data set WORK.BMI_FIRSTOBS_DATA has 106 observations and 6 variables. NOTE: PROCEDURE SORT used (Total process time): NOTE: There were 10 observations read from the data set WORK.BMI_FIRSTOBS_DATA. 4

5 NOTE: PROCEDURE PRINT used (Total process time): OUTPUT The SAS System Obs patid age gender measure_dt bmi birth_date F 01/03/ /28/ F 01/04/ /13/ M 01/06/ /24/ F 01/06/ /21/ F 01/07/ /15/ F 01/10/ /23/ F 01/13/ /08/ F 01/15/ /22/ M 01/16/ /25/ F 01/18/ /08/1936 The dataset BMI_FIRSTOBS_DATA contains a subset of the original vitals measures defined in the dataset VITAL who fit the BMI and measurement date criteria defined above. Using PROC SQL to build the same cohort requires a little more intuition but can save time in both processing speed and CPU resources. Consider original VITALS and DEMOGRAPHICS tables and the steps above to merge and subset the two data sources. Using several nested queries, or queries-within-queries, PROC SQL can accomplish the same cohort construction in a single PROC SQL statement. The code begins identifying the final structure of the cohort table, namely the columns (variables) desired, grouping, ordering, etc. Using the SELECT DISTINCT function syntax (a unique row identifier in PROC SQL similar to the NODUPKEY option in the DATA step), the query limits results to retain only unique PATID/AGE/GENDER/MEASURE_DT/BMI combinations. The FROM statement would normally reference a dataset where the source data is located; in this case it begins the first nested query (N1), beginning with a new SELECT DISTINCT statement and column selections. While the table structure in N1 reflects the same structure as the table being created a layer above (i.e. the same columns are being pulled, albeit in a different order), the column names are proceeded with a table alias. These aliases reference tables and are placeholders for the PROC SQL statement to identify which table contains which column of interest. The second FROM statement leads into the second nested query (N2), which is where the cohort is actually defined under the WHERE statement (BMI >= 35, MEASURE_DT between January 1, 2005 and December 31, 2006). N2 only selects the columns PATID and MIN(MEASURE_DT), which reflects the first or numerically lowest measure date defined by the query. N2 also contains both a SELECT DISTINCT and GROUP BY statement, which under normal circumstances would be redundant as both statements reference PATID and would thus perform the same grouping/de-duping of the data. However, using both statements together ensures that the MIN(VITALS.MEASURE_DT) statement is applied to each PATID. Without both the SELECT DISTINCT PATID and GROUP BY PATID statements, the MIN(VITALS.MEASURE_DT) would be applied globally to the entire dataset, generating the same minimum measure date for the entire population instead of locally within each patients observed BMI records in the observation period. N2 is closed with a table alias A and then rejoined to the VITALS table to obtain the BMI record associated with the MIN(MEASURE_DT) and BMI selection criteria in N2. By joining N2 to VITALS on both matching PATID and MEASURE_DT, N1 is validated to reflect the correct minimum measure date and associated BMI reading. N1 is also joined to the DEMOGRAPHICS table to pull in the desired demographics information. N1 is then closed, and the initial query runs, generating the summary table BMI_FIRSTOBS_SQL. The final table populates the desired fields with the data obtained in N1, which in turn was generated from the data obtained in N2. Finally, BMI_FIRSTOBS_SQL is ordered by PATID and MEASURE_DT to match the repeated measures structure of BMI_FIRSTOBS_DATA. A PROC COMPARE is performed on the two datasets to verify their identical nature. /* Nested PROC SQL */ proc sql; create table bmi_firstobs_sql as 5

6 select distinct patid, round((measure_dt-birth_date)/365.25,1) as age, gender, measure_dt, bmi from /* Begin first nested query N1 */ (select distinct a.patid, A.measure_dt, demo.birth_date, vitals.bmi as bmi, demo.gender from /* Begin second nested query N2 */ (select distinct vitals.patid, min(vitals.measure_dt) as measure_dt format=mmddyy10. from vitals vitals where vitals.bmi >= 35 /* BMI Cohort Criteria */ and vitals.measure_dt between '01jan05'd and '31dec06'd /* Observation Period Criteria */ group by vitals.patid) A /* End N2 */ /* Rejoin A to VITALS to ensure retained MEASURE_DT matches MEASURE_DT from table A */ left outer join vitals vitals /* VITALS table alias */ on A.patid = vitals.patid and A.measure_dt = vitals.measure_dt /* Join second nested query containing data from the VITALS to DEMOGRAPHICS table */ left outer join demographics demo /* DEMOGRAPHICS table alias */ on demo.patid = A.patid) /* End N1 */ order by patid, measure_dt; quit; LOG NOTE: Table WORK.BMI_FIRSTOBS_SQL created, with 106 rows and 5 columns. NOTE: PROCEDURE SQL used (Total process time): 0.07 seconds 0.03 seconds /* Verify */ /* Sort bmi_firstobs_data to match bmi_firstobs_sql */ proc sort data=bmi_firstobs_data; by patid measure_dt ; proc compare base=bmi_firstobs_data compare=bmi_firstobs_sql; LOG NOTE: There were 106 observations read from the data set WORK.BMI_FIRSTOBS_DATA. NOTE: The data set WORK.BMI_FIRSTOBS_DATA has 106 observations and 6 variables. NOTE: PROCEDURE SORT used (Total process time): NOTE: There were 106 observations read from the data set WORK.BMI_FIRSTOBS_DATA. NOTE: There were 106 observations read from the data set WORK.BMI_FIRSTOBS_SQL. 6

7 NOTE: PROCEDURE COMPARE used (Total process time): OUTPUT The SAS System The COMPARE Procedure Comparison of WORK.BMI_FIRSTOBS_DATA with WORK.BMI_FIRSTOBS_SQL (Method=EXACT) Data Set Summary Dataset Created Modified NVar NObs WORK.BMI_FIRSTOBS_DATA 03AUG11:15:48:54 03AUG11:15:48: WORK.BMI_FIRSTOBS_SQL 03AUG11:15:46:08 03AUG11:15:46: Variables Summary Number of Variables in Common: 5. Number of Variables in WORK.BMI_FIRSTOBS_DATA but not in WORK.BMI_FIRSTOBS_SQL: 1. Observation Summary Observation Base Compare First Obs 1 1 Last Obs Number of Observations in Common: 106. Total Number of Observations Read from WORK.BMI_FIRSTOBS_DATA: 106. Total Number of Observations Read from WORK.BMI_FIRSTOBS_SQL: 106. Number of Observations with Some Compared Variables Unequal: 0. Number of Observations with All Compared Variables Equal: 106. NOTE: No unequal values were found. All values compared are exactly equal. AGGREGATION The power and efficiency of nested PROC SQL statements can be shown when generating simple, summary datasets with aggregate results. Borrowing from the previous example, a programmer can simplify output datasets into a crosstab of counts of patients by year, age and BMI groupings. Using PROC FORMAT to establish BMI and age groups for aggregation, both the DATA step and PROC SQL can generate the same summarized datasets of aggregate counts of patients. The code below lists the steps to generate output using both methods. First, the format AGEGRP_ and BMIGRP_ roll up the AGE and BMI variable records into 10-unit increments, respectively. The dataset STEP5 aggregates individual measure dates into years using the YEAR function. Finally, a PROC FREQ generates the table FREQSUMMARY with a patient count crosstab of YEAR-by-AGEGRP-BMI. While the DATA step requires an additional three steps in coding to generate these summary results, PROC SQL creates the same dataset in one statement using the same nested query with only minor adjustments to the output dataset structure. The dataset BMI_SUMMARY_SQL is created using the same nested query structure from the previous example, and like the DATA step to create STEP5, modifies the AGE and BMI columns using the AGEGRP_ and BMIGRP_ formats and MEASURE_DT using the YEAR function. Instead of a SELECT DISTINCT syntax, PATID is aggregated to counts using the COUNT DISTINCT function in PROC SQL. Finally, 7

8 BMI_SUMMARY_SQL is aggregated using the GROUP BY statement below, generating a crosstab of patient counts by the new columns YEAR, AGEGRP and BMIGRP. /* Aggregation */ /* Formats */ proc format; /* Age Groups */ value agegrp_.= '00: NULL ' low -< 15 = '01: 1-17 yrs old' 15 -< 25 = '02: yrs old' 25 -< 35 = '03: yrs old' 35 -< 50 = '04: yrs old' 50 -< 65 = '05: yrs old' 65 - high = '06: 65+ yrs old' ; /* BMI Groups */ value bmigrp_.= '00: NULL ' low -< 20 = '01: BMI under 20' 20 -< 30 = '02: BMI 20-29' 30 -< 40 = '03: BMI 30-39' 40 -< 50 = '04: BMI 40-49' 50 - high = '05: BMI 50+' ; quit; /* Step 4- Sort data for aggregation */ proc sort data=bmi_firstobs_data out=step4; by patid; /* Step 5- Aggregate data */ DATA step5 (keep=year patid agegrp bmigrp); set step4; by patid; if first.patid; agegrp=put(age,agegrp_.); bmigrp=put(bmi, bmigrp_.) ; year= year(measure_dt); *need to format dates to year; /* Step 6- Summarize data */ proc freq data=step5; tables year*agegrp*bmigrp /list noprint nopercent out = freqsummary; /* Nested PROC SQL Summary */ proc sql; create table bmi_summary_sql as select year(measure_dt) as year, put(round((measure_dt-birth_date)/365.25,1), agegrp_.) as agegrp /* Format for aggregation */, put(bmi, bmigrp_.) as bmigrp /* Format for aggregation */, count(distinct patid) as count /* Count patients for summarization */ from (select distinct a.patid, a.measure_dt, demo.birth_date, vitals.bmi as bmi, demo.gender from (select distinct vitals.patid 8

9 , min(vitals.measure_dt) as measure_dt format=mmddyy10. from vitals vitals where vitals.bmi >= 35 and vitals.measure_dt between '01jan05'd and '31dec06'd group by vitals.patid) a left outer join demographics demo on demo.patid = a.patid left outer join vitals vitals on a.patid = vitals.patid and a.measure_dt = vitals.measure_dt) group by year, agegrp, bmigrp /* Aggregate patient counts by GROUP BY statement */ order by year, agegrp, bmigrp ; quit; /* Validate results */ proc compare base=freqsummary compare=bmi_summary_sql; LOG NOTE: Format AGEGRP_ has been output. NOTE: Format BMIGRP_ has been output. NOTE: PROCEDURE FORMAT used (Total process time): NOTE: Input data set is already sorted; it has been copied to the output data set. NOTE: There were 106 observations read from the data set WORK.BMI_FIRSTOBS_DATA. NOTE: The data set WORK.STEP4 has 106 observations and 6 variables. NOTE: PROCEDURE SORT used (Total process time): NOTE: There were 106 observations read from the data set WORK.STEP4. NOTE: The data set WORK.STEP5 has 106 observations and 4 variables. NOTE: DATA statement used (Total process time): 0.04 seconds 0.03 seconds NOTE: There were 106 observations read from the data set WORK.STEP5. NOTE: The data set WORK.FREQSUMMARY has 21 observations and 5 variables. NOTE: PROCEDURE FREQ used (Total process time): 0.03 seconds NOTE: Table WORK.BMI_SUMMARY_SQL created, with 21 rows and 4 columns. NOTE: PROCEDURE SQL used (Total process time): 0.10 seconds 378 proc compare base=freqsummary compare=bmi_summary_sql; 379 NOTE: There were 21 observations read from the data set WORK.FREQSUMMARY. NOTE: There were 21 observations read from the data set WORK.BMI_SUMMARY_SQL. 9

10 NOTE: PROCEDURE COMPARE used (Total process time): OUTPUT The SAS System The COMPARE Procedure Comparison of WORK.FREQSUMMARY with WORK.BMI_SUMMARY_SQL (Method=EXACT) Data Set Summary Dataset Created Modified NVar NObs WORK.FREQSUMMARY 03AUG11:17:35:47 03AUG11:17:35: WORK.BMI_SUMMARY_SQL 03AUG11:17:35:58 03AUG11:17:35: Variables Summary Number of Variables in Common: 4. Number of Variables in WORK.FREQSUMMARY but not in WORK.BMI_SUMMARY_SQL: 1. Number of Variables with Differing Attributes: 1. Listing of Common Variables with Differing Attributes Variable Dataset Type Length Label COUNT WORK.FREQSUMMARY Num 8 Frequency Count WORK.BMI_SUMMARY_SQL Num 8 Observation Summary Observation Base Compare First Obs 1 1 Last Obs Number of Observations in Common: 21. Total Number of Observations Read from WORK.FREQSUMMARY: 21. Total Number of Observations Read from WORK.BMI_SUMMARY_SQL: 21. Number of Observations with Some Compared Variables Unequal: 0. Number of Observations with All Compared Variables Equal: 21. NOTE: No unequal values were found. All values compared are exactly equal. CONCLUSION PROC SQL has considerable advantages in coding efficiency over the DATA step, particularly when sub setting data on repeated measures. While the intuition behind using PROC SQL to generate FIRST. procedures isn t as apparent as through the traditional DATA step, it s concise presentation in both coding structure and minimal output can help expedite program generation and distribution across users. Creating aggregate counts and summaries off of tables generated in PROC SQL add to the efficiency available of this robust procedure. While the traditional DATA step can be more efficient in terms of query speed and table generation for generating incidence/prevalence in many instances, the PROC SQL method presented in this paper has coding advantages with respect to data organization and transference to other SQL-based software platforms. 10

11 REFERENCES Dickstein, Craig and Pass, Ray, DATA Step vs. PROC SQL: What s A Neophyte To Do?, Proceedings from the Twenty-Ninth Annual SAS Users Group International Conference, paper 296, Fain, Terry and Gareleck, Cyndie (254-30), Using SAS to Process Repeated Measures Data, Proceedings of the Thirtieth Annual SAS Users Group International Conference, paper 254, Matthews, Joann, PROC SQL Versus The DATA Step, Northeastern SAS Users Group Conference, 2006 SAS Institute Inc. (1990), Base SAS 9.2 SAS Procedure Guide, Third Edition, Cary, NC: SAS Institute Inc. Severino, Richard, PROC FREQ: It s More Than Counts, Proceedings of the twenty-fifth Annual SAS Users Group International Conference, paper69, 2000 CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: David Tabano Kaiser Permanente 2550 S. Parker Rd., Suite 300 Aurora, CO (303) (work) (303) (fax) david.c.tabano@kp.org SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. 11

The Power of Combining Data with the PROC SQL

The Power of Combining Data with the PROC SQL ABSTRACT Paper CC-09 The Power of Combining Data with the PROC SQL Stacey Slone, University of Kentucky Markey Cancer Center Combining two data sets which contain a common identifier with a MERGE statement

More information

PROC SQL vs. DATA Step Processing. T Winand, Customer Success Technical Team

PROC SQL vs. DATA Step Processing. T Winand, Customer Success Technical Team PROC SQL vs. DATA Step Processing T Winand, Customer Success Technical Team Copyright 2012, SAS Institute Inc. All rights reserved. Agenda PROC SQL VS. DATA STEP PROCESSING Comparison of DATA Step and

More information

How to Incorporate Old SAS Data into a New DATA Step, or What is S-M-U?

How to Incorporate Old SAS Data into a New DATA Step, or What is S-M-U? Paper 54-25 How to Incorporate Old SAS Data into a New DATA Step, or What is S-M-U? Andrew T. Kuligowski Nielsen Media Research Abstract / Introduction S-M-U. Some people will see these three letters and

More information

PROC SQL VS. DATA STEP PROCESSING JEFF SIMPSON SAS CUSTOMER LOYALTY

PROC SQL VS. DATA STEP PROCESSING JEFF SIMPSON SAS CUSTOMER LOYALTY PROC SQL VS. DATA STEP PROCESSING JEFF SIMPSON SAS CUSTOMER LOYALTY PROC SQL VS. DATA STEP What types of functionality do each provide Types of Joins Replicating joins using the data step How do each work

More information

Get SAS sy with PROC SQL Amie Bissonett, Pharmanet/i3, Minneapolis, MN

Get SAS sy with PROC SQL Amie Bissonett, Pharmanet/i3, Minneapolis, MN PharmaSUG 2012 - Paper TF07 Get SAS sy with PROC SQL Amie Bissonett, Pharmanet/i3, Minneapolis, MN ABSTRACT As a data analyst for genetic clinical research, I was often working with familial data connecting

More information

%MAKE_IT_COUNT: An Example Macro for Dynamic Table Programming Britney Gilbert, Juniper Tree Consulting, Porter, Oklahoma

%MAKE_IT_COUNT: An Example Macro for Dynamic Table Programming Britney Gilbert, Juniper Tree Consulting, Porter, Oklahoma Britney Gilbert, Juniper Tree Consulting, Porter, Oklahoma ABSTRACT Today there is more pressure on programmers to deliver summary outputs faster without sacrificing quality. By using just a few programming

More information

Chapter 6: Modifying and Combining Data Sets

Chapter 6: Modifying and Combining Data Sets Chapter 6: Modifying and Combining Data Sets The SET statement is a powerful statement in the DATA step. Its main use is to read in a previously created SAS data set which can be modified and saved as

More information

Cleaning Duplicate Observations on a Chessboard of Missing Values Mayrita Vitvitska, ClinOps, LLC, San Francisco, CA

Cleaning Duplicate Observations on a Chessboard of Missing Values Mayrita Vitvitska, ClinOps, LLC, San Francisco, CA Cleaning Duplicate Observations on a Chessboard of Missing Values Mayrita Vitvitska, ClinOps, LLC, San Francisco, CA ABSTRACT Removing duplicate observations from a data set is not as easy as it might

More information

Essentials of PDV: Directing the Aim to Understanding the DATA Step! Arthur Xuejun Li, City of Hope National Medical Center, Duarte, CA

Essentials of PDV: Directing the Aim to Understanding the DATA Step! Arthur Xuejun Li, City of Hope National Medical Center, Duarte, CA PharmaSUG 2013 - Paper TF17 Essentials of PDV: Directing the Aim to Understanding the DATA Step! Arthur Xuejun Li, City of Hope National Medical Center, Duarte, CA ABSTRACT Beginning programmers often

More information

Mapping Clinical Data to a Standard Structure: A Table Driven Approach

Mapping Clinical Data to a Standard Structure: A Table Driven Approach ABSTRACT Paper AD15 Mapping Clinical Data to a Standard Structure: A Table Driven Approach Nancy Brucken, i3 Statprobe, Ann Arbor, MI Paul Slagle, i3 Statprobe, Ann Arbor, MI Clinical Research Organizations

More information

Querying Data with Transact SQL

Querying Data with Transact SQL Course 20761A: Querying Data with Transact SQL Course details Course Outline Module 1: Introduction to Microsoft SQL Server 2016 This module introduces SQL Server, the versions of SQL Server, including

More information

Ditch the Data Memo: Using Macro Variables and Outer Union Corresponding in PROC SQL to Create Data Set Summary Tables Andrea Shane MDRC, Oakland, CA

Ditch the Data Memo: Using Macro Variables and Outer Union Corresponding in PROC SQL to Create Data Set Summary Tables Andrea Shane MDRC, Oakland, CA ABSTRACT Ditch the Data Memo: Using Macro Variables and Outer Union Corresponding in PROC SQL to Create Data Set Summary Tables Andrea Shane MDRC, Oakland, CA Data set documentation is essential to good

More information

50 WAYS TO MERGE YOUR DATA INSTALLMENT 1 Kristie Schuster, LabOne, Inc., Lenexa, Kansas Lori Sipe, LabOne, Inc., Lenexa, Kansas

50 WAYS TO MERGE YOUR DATA INSTALLMENT 1 Kristie Schuster, LabOne, Inc., Lenexa, Kansas Lori Sipe, LabOne, Inc., Lenexa, Kansas Paper 103-26 50 WAYS TO MERGE YOUR DATA INSTALLMENT 1 Kristie Schuster, LabOne, Inc., Lenexa, Kansas Lori Sipe, LabOne, Inc., Lenexa, Kansas ABSTRACT When you need to join together two datasets, how do

More information

Beginning Tutorials. PROC FSEDIT NEW=newfilename LIKE=oldfilename; Fig. 4 - Specifying a WHERE Clause in FSEDIT. Data Editing

Beginning Tutorials. PROC FSEDIT NEW=newfilename LIKE=oldfilename; Fig. 4 - Specifying a WHERE Clause in FSEDIT. Data Editing Mouse Clicking Your Way Viewing and Manipulating Data with Version 8 of the SAS System Terry Fain, RAND, Santa Monica, California Cyndie Gareleck, RAND, Santa Monica, California ABSTRACT Version 8 of the

More information

Hypothesis Testing: An SQL Analogy

Hypothesis Testing: An SQL Analogy Hypothesis Testing: An SQL Analogy Leroy Bracken, Boulder Creek, CA Paul D Sherman, San Jose, CA ABSTRACT This paper is all about missing data. Do you ever know something about someone but don't know who

More information

Programming Gems that are worth learning SQL for! Pamela L. Reading, Rho, Inc., Chapel Hill, NC

Programming Gems that are worth learning SQL for! Pamela L. Reading, Rho, Inc., Chapel Hill, NC Paper CC-05 Programming Gems that are worth learning SQL for! Pamela L. Reading, Rho, Inc., Chapel Hill, NC ABSTRACT For many SAS users, learning SQL syntax appears to be a significant effort with a low

More information

A Breeze through SAS options to Enter a Zero-filled row Kajal Tahiliani, ICON Clinical Research, Warrington, PA

A Breeze through SAS options to Enter a Zero-filled row Kajal Tahiliani, ICON Clinical Research, Warrington, PA ABSTRACT: A Breeze through SAS options to Enter a Zero-filled row Kajal Tahiliani, ICON Clinical Research, Warrington, PA Programmers often need to summarize data into tables as per template. But study

More information

Programming Beyond the Basics. Find() the power of Hash - How, Why and When to use the SAS Hash Object John Blackwell

Programming Beyond the Basics. Find() the power of Hash - How, Why and When to use the SAS Hash Object John Blackwell Find() the power of Hash - How, Why and When to use the SAS Hash Object John Blackwell ABSTRACT The SAS hash object has come of age in SAS 9.2, giving the SAS programmer the ability to quickly do things

More information

A Quick and Gentle Introduction to PROC SQL

A Quick and Gentle Introduction to PROC SQL ABSTRACT Paper B2B 9 A Quick and Gentle Introduction to PROC SQL Shane Rosanbalm, Rho, Inc. Sam Gillett, Rho, Inc. If you are afraid of SQL, it is most likely because you haven t been properly introduced.

More information

Sorting big datasets. Do we really need it? Daniil Shliakhov, Experis Clinical, Kharkiv, Ukraine

Sorting big datasets. Do we really need it? Daniil Shliakhov, Experis Clinical, Kharkiv, Ukraine PharmaSUG 2015 - Paper QT21 Sorting big datasets. Do we really need it? Daniil Shliakhov, Experis Clinical, Kharkiv, Ukraine ABSTRACT Very often working with big data causes difficulties for SAS programmers.

More information

There s No Such Thing as Normal Clinical Trials Data, or Is There? Daphne Ewing, Octagon Research Solutions, Inc., Wayne, PA

There s No Such Thing as Normal Clinical Trials Data, or Is There? Daphne Ewing, Octagon Research Solutions, Inc., Wayne, PA Paper HW04 There s No Such Thing as Normal Clinical Trials Data, or Is There? Daphne Ewing, Octagon Research Solutions, Inc., Wayne, PA ABSTRACT Clinical Trials data comes in all shapes and sizes depending

More information

Using Templates Created by the SAS/STAT Procedures

Using Templates Created by the SAS/STAT Procedures Paper 081-29 Using Templates Created by the SAS/STAT Procedures Yanhong Huang, Ph.D. UMDNJ, Newark, NJ Jianming He, Solucient, LLC., Berkeley Heights, NJ ABSTRACT SAS procedures provide a large quantity

More information

A Simple Framework for Sequentially Processing Hierarchical Data Sets for Large Surveys

A Simple Framework for Sequentially Processing Hierarchical Data Sets for Large Surveys A Simple Framework for Sequentially Processing Hierarchical Data Sets for Large Surveys Richard L. Downs, Jr. and Pura A. Peréz U.S. Bureau of the Census, Washington, D.C. ABSTRACT This paper explains

More information

Using Proc Freq for Manageable Data Summarization

Using Proc Freq for Manageable Data Summarization 1 CC27 Using Proc Freq for Manageable Data Summarization Curtis Wolf, DataCeutics, Inc. A SIMPLE BUT POWERFUL PROC The Frequency procedure can be very useful for getting a general sense of the contents

More information

MANAGING DATA(BASES) USING SQL (NON-PROCEDURAL SQL, X401.9)

MANAGING DATA(BASES) USING SQL (NON-PROCEDURAL SQL, X401.9) Technology & Information Management Instructor: Michael Kremer, Ph.D. Class 6 Professional Program: Data Administration and Management MANAGING DATA(BASES) USING SQL (NON-PROCEDURAL SQL, X401.9) AGENDA

More information

9 Ways to Join Two Datasets David Franklin, Independent Consultant, New Hampshire, USA

9 Ways to Join Two Datasets David Franklin, Independent Consultant, New Hampshire, USA 9 Ways to Join Two Datasets David Franklin, Independent Consultant, New Hampshire, USA ABSTRACT Joining or merging data is one of the fundamental actions carried out when manipulating data to bring it

More information

Advanced Visualization using TIBCO Spotfire and SAS

Advanced Visualization using TIBCO Spotfire and SAS PharmaSUG 2018 - Paper DV-04 ABSTRACT Advanced Visualization using TIBCO Spotfire and SAS Ajay Gupta, PPD, Morrisville, USA In Pharmaceuticals/CRO industries, you may receive requests from stakeholders

More information

Anatomy of a Merge Gone Wrong James Lew, Compu-Stat Consulting, Scarborough, ON, Canada Joshua Horstman, Nested Loop Consulting, Indianapolis, IN, USA

Anatomy of a Merge Gone Wrong James Lew, Compu-Stat Consulting, Scarborough, ON, Canada Joshua Horstman, Nested Loop Consulting, Indianapolis, IN, USA ABSTRACT PharmaSUG 2013 - Paper TF22 Anatomy of a Merge Gone Wrong James Lew, Compu-Stat Consulting, Scarborough, ON, Canada Joshua Horstman, Nested Loop Consulting, Indianapolis, IN, USA The merge is

More information

Modular processing for Prep-to-research analysis- interfacing XML, SAS and Microsoft Excel David C. Tabano, Kaiser Permanente, Denver, CO

Modular processing for Prep-to-research analysis- interfacing XML, SAS and Microsoft Excel David C. Tabano, Kaiser Permanente, Denver, CO Modular processing for Prep-to-research analysis- interfacing XML, SAS and Microsoft Excel David C. Tabano, Kaiser Permanente, Denver, CO ABSTRACT Healthcare research is often preceded by a prep-to-research

More information

Correcting for natural time lag bias in non-participants in pre-post intervention evaluation studies

Correcting for natural time lag bias in non-participants in pre-post intervention evaluation studies Correcting for natural time lag bias in non-participants in pre-post intervention evaluation studies Gandhi R Bhattarai PhD, OptumInsight, Rocky Hill, CT ABSTRACT Measuring the change in outcomes between

More information

Clinical Data Visualization using TIBCO Spotfire and SAS

Clinical Data Visualization using TIBCO Spotfire and SAS ABSTRACT SESUG Paper RIV107-2017 Clinical Data Visualization using TIBCO Spotfire and SAS Ajay Gupta, PPD, Morrisville, USA In Pharmaceuticals/CRO industries, you may receive requests from stakeholders

More information

The Power of PROC SQL Techniques and SAS Dictionary Tables in Handling Data

The Power of PROC SQL Techniques and SAS Dictionary Tables in Handling Data Paper PO31 The Power of PROC SQL Techniques and SAS Dictionary Tables in Handling Data MaryAnne DePesquo Hope, Health Services Advisory Group, Phoenix, Arizona Fen Fen Li, Health Services Advisory Group,

More information

Personally Identifiable Information Secured Transformation

Personally Identifiable Information Secured Transformation , ABSTRACT Organizations that create and store Personally Identifiable Information (PII) are often required to de-identify sensitive data to protect individuals privacy. There are multiple methods that

More information

Files Arriving at an Inconvenient Time? Let SAS Process Your Files with FILEEXIST While You Sleep

Files Arriving at an Inconvenient Time? Let SAS Process Your Files with FILEEXIST While You Sleep Files Arriving at an Inconvenient Time? Let SAS Process Your Files with FILEEXIST While You Sleep Educational Testing Service SAS and all other SAS Institute Inc. product or service names are registered

More information

Macro to compute best transform variable for the model

Macro to compute best transform variable for the model Paper 3103-2015 Macro to compute best transform variable for the model Nancy Hu, Discover Financial Service ABSTRACT This study is intended to assist Analysts to generate the best of variables using simple

More information

Using PROC SQL to Generate Shift Tables More Efficiently

Using PROC SQL to Generate Shift Tables More Efficiently ABSTRACT SESUG Paper 218-2018 Using PROC SQL to Generate Shift Tables More Efficiently Jenna Cody, IQVIA Shift tables display the change in the frequency of subjects across specified categories from baseline

More information

Mastering Data Summarization with PROC SQL

Mastering Data Summarization with PROC SQL ABSTRACT Paper 6481-2016 Mastering Data Summarization with PROC SQL Christianna Williams PhD, Chapel Hill, NC The SQL procedure is extremely powerful when it comes to summarizing and aggregating data,

More information

Tales from the Help Desk 6: Solutions to Common SAS Tasks

Tales from the Help Desk 6: Solutions to Common SAS Tasks SESUG 2015 ABSTRACT Paper BB-72 Tales from the Help Desk 6: Solutions to Common SAS Tasks Bruce Gilsen, Federal Reserve Board, Washington, DC In 30 years as a SAS consultant at the Federal Reserve Board,

More information

If You Need These OBS and These VARS, Then Drop IF, and Keep WHERE Jay Iyengar, Data Systems Consultants LLC

If You Need These OBS and These VARS, Then Drop IF, and Keep WHERE Jay Iyengar, Data Systems Consultants LLC Paper 2417-2018 If You Need These OBS and These VARS, Then Drop IF, and Keep WHERE Jay Iyengar, Data Systems Consultants LLC ABSTRACT Reading data effectively in the DATA step requires knowing the implications

More information

Paper PS05_05 Using SAS to Process Repeated Measures Data Terry Fain, RAND Corporation Cyndie Gareleck, RAND Corporation

Paper PS05_05 Using SAS to Process Repeated Measures Data Terry Fain, RAND Corporation Cyndie Gareleck, RAND Corporation Paper PS05_05 Using SAS to Process Repeated Measures Data Terry Fain, RAND Corporation Cyndie Gareleck, RAND Corporation ABSTRACT Data that contain multiple observations per case are called repeated measures

More information

To conceptualize the process, the table below shows the highly correlated covariates in descending order of their R statistic.

To conceptualize the process, the table below shows the highly correlated covariates in descending order of their R statistic. Automating the process of choosing among highly correlated covariates for multivariable logistic regression Michael C. Doherty, i3drugsafety, Waltham, MA ABSTRACT In observational studies, there can be

More information

Beginning Tutorials. Paper 53-27

Beginning Tutorials. Paper 53-27 Paper 53-27 DATA Step vs. What s a neophyte to do? Craig Dickstein, Tamarack Professional Services, Weare, NH Ray Pass, Ray Pass Consulting, Hartsdale, NY ABSTRACT "What's all the buzz about Proc SQL?

More information

Using PROC PLAN for Randomization Assignments

Using PROC PLAN for Randomization Assignments Using PROC PLAN for Randomization Assignments Miriam W. Rosenblatt Division of General Internal Medicine and Health Care Research, University. Hospitals of Cleveland Abstract This tutorial is an introduction

More information

Report Writing, SAS/GRAPH Creation, and Output Verification using SAS/ASSIST Matthew J. Becker, ST TPROBE, inc., Ann Arbor, MI

Report Writing, SAS/GRAPH Creation, and Output Verification using SAS/ASSIST Matthew J. Becker, ST TPROBE, inc., Ann Arbor, MI Report Writing, SAS/GRAPH Creation, and Output Verification using SAS/ASSIST Matthew J. Becker, ST TPROBE, inc., Ann Arbor, MI Abstract Since the release of SAS/ASSIST, SAS has given users more flexibility

More information

Paper DB2 table. For a simple read of a table, SQL and DATA step operate with similar efficiency.

Paper DB2 table. For a simple read of a table, SQL and DATA step operate with similar efficiency. Paper 76-28 Comparative Efficiency of SQL and Base Code When Reading from Database Tables and Existing Data Sets Steven Feder, Federal Reserve Board, Washington, D.C. ABSTRACT In this paper we compare

More information

Going Under the Hood: How Does the Macro Processor Really Work?

Going Under the Hood: How Does the Macro Processor Really Work? Going Under the Hood: How Does the Really Work? ABSTRACT Lisa Lyons, PPD, Inc Hamilton, NJ Did you ever wonder what really goes on behind the scenes of the macro processor, or how it works with other parts

More information

10 The First Steps 4 Chapter 2

10 The First Steps 4 Chapter 2 9 CHAPTER 2 Examples The First Steps 10 Invoking the Query Window 11 Changing Your Profile 11 ing a Table 13 ing Columns 14 Alias Names and Labels 14 Column Format 16 Creating a WHERE Expression 17 Available

More information

Statistics, Data Analysis & Econometrics

Statistics, Data Analysis & Econometrics ST009 PROC MI as the Basis for a Macro for the Study of Patterns of Missing Data Carl E. Pierchala, National Highway Traffic Safety Administration, Washington ABSTRACT The study of missing data patterns

More information

CC13 An Automatic Process to Compare Files. Simon Lin, Merck & Co., Inc., Rahway, NJ Huei-Ling Chen, Merck & Co., Inc., Rahway, NJ

CC13 An Automatic Process to Compare Files. Simon Lin, Merck & Co., Inc., Rahway, NJ Huei-Ling Chen, Merck & Co., Inc., Rahway, NJ CC13 An Automatic Process to Compare Files Simon Lin, Merck & Co., Inc., Rahway, NJ Huei-Ling Chen, Merck & Co., Inc., Rahway, NJ ABSTRACT Comparing different versions of output files is often performed

More information

Merging Data Eight Different Ways

Merging Data Eight Different Ways Paper 197-2009 Merging Data Eight Different Ways David Franklin, Independent Consultant, New Hampshire, USA ABSTRACT Merging data is a fundamental function carried out when manipulating data to bring it

More information

Introduction / Overview

Introduction / Overview Paper # SC18 Exploring SAS Generation Data Sets Kirk Paul Lafler, Software Intelligence Corporation Abstract Users have at their disposal a unique and powerful feature for retaining historical copies of

More information

Guide Users along Information Pathways and Surf through the Data

Guide Users along Information Pathways and Surf through the Data Guide Users along Information Pathways and Surf through the Data Stephen Overton, Overton Technologies, LLC, Raleigh, NC ABSTRACT Business information can be consumed many ways using the SAS Enterprise

More information

A SAS/AF Application for Linking Demographic & Laboratory Data For Participants in Clinical & Epidemiologic Research Studies

A SAS/AF Application for Linking Demographic & Laboratory Data For Participants in Clinical & Epidemiologic Research Studies Paper 208 A SAS/AF Application for Linking Demographic & Laboratory Data For Participants in Clinical & Epidemiologic Research Studies Authors: Emily A. Mixon; Karen B. Fowler, University of Alabama at

More information

PREREQUISITES FOR EXAMPLES

PREREQUISITES FOR EXAMPLES 212-2007 SAS Information Map Studio and SAS Web Report Studio A Tutorial Angela Hall, Zencos Consulting LLC, Durham, NC Brian Miles, Zencos Consulting LLC, Durham, NC ABSTRACT Find out how to provide the

More information

Chaining Logic in One Data Step Libing Shi, Ginny Rego Blue Cross Blue Shield of Massachusetts, Boston, MA

Chaining Logic in One Data Step Libing Shi, Ginny Rego Blue Cross Blue Shield of Massachusetts, Boston, MA Chaining Logic in One Data Step Libing Shi, Ginny Rego Blue Cross Blue Shield of Massachusetts, Boston, MA ABSTRACT Event dates stored in multiple rows pose many challenges that have typically been resolved

More information

SAS Macro Technique for Embedding and Using Metadata in Web Pages. DataCeutics, Inc., Pottstown, PA

SAS Macro Technique for Embedding and Using Metadata in Web Pages. DataCeutics, Inc., Pottstown, PA Paper AD11 SAS Macro Technique for Embedding and Using Metadata in Web Pages Paul Gilbert, Troy A. Ruth, Gregory T. Weber DataCeutics, Inc., Pottstown, PA ABSTRACT This paper will present a technique to

More information

SAS Macro Dynamics - From Simple Basics to Powerful Invocations Rick Andrews, Office of the Actuary, CMS, Baltimore, MD

SAS Macro Dynamics - From Simple Basics to Powerful Invocations Rick Andrews, Office of the Actuary, CMS, Baltimore, MD Paper BB-7 SAS Macro Dynamics - From Simple Basics to Powerful Invocations Rick Andrews, Office of the Actuary, CMS, Baltimore, MD ABSTRACT The SAS Macro Facility offers a mechanism for expanding and customizing

More information

Uncommon Techniques for Common Variables

Uncommon Techniques for Common Variables Paper 11863-2016 Uncommon Techniques for Common Variables Christopher J. Bost, MDRC, New York, NY ABSTRACT If a variable occurs in more than one data set being merged, the last value (from the variable

More information

SAS Scalable Performance Data Server 4.3

SAS Scalable Performance Data Server 4.3 Scalability Solution for SAS Dynamic Cluster Tables A SAS White Paper Table of Contents Introduction...1 Cluster Tables... 1 Dynamic Cluster Table Loading Benefits... 2 Commands for Creating and Undoing

More information

SAS Application to Automate a Comprehensive Review of DEFINE and All of its Components

SAS Application to Automate a Comprehensive Review of DEFINE and All of its Components PharmaSUG 2017 - Paper AD19 SAS Application to Automate a Comprehensive Review of DEFINE and All of its Components Walter Hufford, Vincent Guo, and Mijun Hu, Novartis Pharmaceuticals Corporation ABSTRACT

More information

Are you Still Afraid of Using Arrays? Let s Explore their Advantages

Are you Still Afraid of Using Arrays? Let s Explore their Advantages Paper CT07 Are you Still Afraid of Using Arrays? Let s Explore their Advantages Vladyslav Khudov, Experis Clinical, Kharkiv, Ukraine ABSTRACT At first glance, arrays in SAS seem to be a complicated and

More information

MTA Database Administrator Fundamentals Course

MTA Database Administrator Fundamentals Course MTA Database Administrator Fundamentals Course Session 1 Section A: Database Tables Tables Representing Data with Tables SQL Server Management Studio Section B: Database Relationships Flat File Databases

More information

Now That You Have Your Data in Hadoop, How Are You Staging Your Analytical Base Tables?

Now That You Have Your Data in Hadoop, How Are You Staging Your Analytical Base Tables? Paper SAS 1866-2015 Now That You Have Your Data in Hadoop, How Are You Staging Your Analytical Base Tables? Steven Sober, SAS Institute Inc. ABSTRACT Well, Hadoop community, now that you have your data

More information

Merge Processing and Alternate Table Lookup Techniques Prepared by

Merge Processing and Alternate Table Lookup Techniques Prepared by Merge Processing and Alternate Table Lookup Techniques Prepared by The syntax for data step merging is as follows: International SAS Training and Consulting This assumes that the incoming data sets are

More information

The Dataset Diet How to transform short and fat into long and thin

The Dataset Diet How to transform short and fat into long and thin Paper TU06 The Dataset Diet How to transform short and fat into long and thin Kathryn Wright, Oxford Pharmaceutical Sciences, UK ABSTRACT What do you do when you are given a dataset with one observation

More information

INTRODUCTION TO PROC SQL JEFF SIMPSON SYSTEMS ENGINEER

INTRODUCTION TO PROC SQL JEFF SIMPSON SYSTEMS ENGINEER INTRODUCTION TO PROC SQL JEFF SIMPSON SYSTEMS ENGINEER THE SQL PROCEDURE The SQL procedure: enables the use of SQL in SAS is part of Base SAS software follows American National Standards Institute (ANSI)

More information

Cell Suppression In SAS Visual Analytics: A Primer

Cell Suppression In SAS Visual Analytics: A Primer ABSTRACT Paper 11883-2016 Cell Suppression In SAS Visual Analytics: A Primer Marc Flore, Institute for Health Policy and Practice, University of New Hampshire In healthcare and other fields, the importance

More information

A Practical Introduction to SAS Data Integration Studio

A Practical Introduction to SAS Data Integration Studio ABSTRACT A Practical Introduction to SAS Data Integration Studio Erik Larsen, Independent Consultant, Charleston, SC Frank Ferriola, Financial Risk Group, Cary, NC A useful and often overlooked tool which

More information

CFB: A Programming Pattern for Creating Change from Baseline Datasets Lei Zhang, Celgene Corporation, Summit, NJ

CFB: A Programming Pattern for Creating Change from Baseline Datasets Lei Zhang, Celgene Corporation, Summit, NJ Paper TT13 CFB: A Programming Pattern for Creating Change from Baseline Datasets Lei Zhang, Celgene Corporation, Summit, NJ ABSTRACT In many clinical studies, Change from Baseline analysis is frequently

More information

Understanding and Applying the Logic of the DOW-Loop

Understanding and Applying the Logic of the DOW-Loop PharmaSUG 2014 Paper BB02 Understanding and Applying the Logic of the DOW-Loop Arthur Li, City of Hope National Medical Center, Duarte, CA ABSTRACT The DOW-loop is not official terminology that one can

More information

Mobile MOUSe MTA DATABASE ADMINISTRATOR FUNDAMENTALS ONLINE COURSE OUTLINE

Mobile MOUSe MTA DATABASE ADMINISTRATOR FUNDAMENTALS ONLINE COURSE OUTLINE Mobile MOUSe MTA DATABASE ADMINISTRATOR FUNDAMENTALS ONLINE COURSE OUTLINE COURSE TITLE MTA DATABASE ADMINISTRATOR FUNDAMENTALS COURSE DURATION 10 Hour(s) of Self-Paced Interactive Training COURSE OVERVIEW

More information

WHAT ARE SASHELP VIEWS?

WHAT ARE SASHELP VIEWS? Paper PN13 There and Back Again: Navigating between a SASHELP View and the Real World Anita Rocha, Center for Studies in Demography and Ecology University of Washington, Seattle, WA ABSTRACT A real strength

More information

A Format to Make the _TYPE_ Field of PROC MEANS Easier to Interpret Matt Pettis, Thomson West, Eagan, MN

A Format to Make the _TYPE_ Field of PROC MEANS Easier to Interpret Matt Pettis, Thomson West, Eagan, MN Paper 045-29 A Format to Make the _TYPE_ Field of PROC MEANS Easier to Interpret Matt Pettis, Thomson West, Eagan, MN ABSTRACT: PROC MEANS analyzes datasets according to the variables listed in its Class

More information

Obtaining the Patient Most Recent Time-stamped Measurements

Obtaining the Patient Most Recent Time-stamped Measurements Obtaining the Patient Most Recent Time-stamped Measurements Yubo Gao, University of Iowa Hospitals and Clinics, Iowa City, Iowa Abstract Each time when patient visited clinic, the clinic took several measurements,

More information

Base and Advance SAS

Base and Advance SAS Base and Advance SAS BASE SAS INTRODUCTION An Overview of the SAS System SAS Tasks Output produced by the SAS System SAS Tools (SAS Program - Data step and Proc step) A sample SAS program Exploring SAS

More information

Countdown of the Top 10 Ways to Merge Data David Franklin, Independent Consultant, Litchfield, NH

Countdown of the Top 10 Ways to Merge Data David Franklin, Independent Consultant, Litchfield, NH PharmaSUG2010 - Paper TU06 Countdown of the Top 10 Ways to Merge Data David Franklin, Independent Consultant, Litchfield, NH ABSTRACT Joining or merging data is one of the fundamental actions carried out

More information

Macros for Two-Sample Hypothesis Tests Jinson J. Erinjeri, D.K. Shifflet and Associates Ltd., McLean, VA

Macros for Two-Sample Hypothesis Tests Jinson J. Erinjeri, D.K. Shifflet and Associates Ltd., McLean, VA Paper CC-20 Macros for Two-Sample Hypothesis Tests Jinson J. Erinjeri, D.K. Shifflet and Associates Ltd., McLean, VA ABSTRACT Statistical Hypothesis Testing is performed to determine whether enough statistical

More information

PROGRAMMING ROLLING REGRESSIONS IN SAS MICHAEL D. BOLDIN, UNIVERSITY OF PENNSYLVANIA, PHILADELPHIA, PA

PROGRAMMING ROLLING REGRESSIONS IN SAS MICHAEL D. BOLDIN, UNIVERSITY OF PENNSYLVANIA, PHILADELPHIA, PA PROGRAMMING ROLLING REGRESSIONS IN SAS MICHAEL D. BOLDIN, UNIVERSITY OF PENNSYLVANIA, PHILADELPHIA, PA ABSTRACT SAS does not have an option for PROC REG (or any of its other equation estimation procedures)

More information

PharmaSUG Paper SP04

PharmaSUG Paper SP04 PharmaSUG 2015 - Paper SP04 Means Comparisons and No Hard Coding of Your Coefficient Vector It Really Is Possible! Frank Tedesco, United Biosource Corporation, Blue Bell, Pennsylvania ABSTRACT When doing

More information

Tired of CALL EXECUTE? Try DOSUBL

Tired of CALL EXECUTE? Try DOSUBL ABSTRACT SESUG Paper BB-132-2017 Tired of CALL EXECUTE? Try DOSUBL Jueru Fan, PPD, Morrisville, NC DOSUBL was first introduced as a function in SAS V9.3. It enables the immediate execution of SAS code

More information

Unlock SAS Code Automation with the Power of Macros

Unlock SAS Code Automation with the Power of Macros SESUG 2015 ABSTRACT Paper AD-87 Unlock SAS Code Automation with the Power of Macros William Gui Zupko II, Federal Law Enforcement Training Centers SAS code, like any computer programming code, seems to

More information

T-SQL Training: T-SQL for SQL Server for Developers

T-SQL Training: T-SQL for SQL Server for Developers Duration: 3 days T-SQL Training Overview T-SQL for SQL Server for Developers training teaches developers all the Transact-SQL skills they need to develop queries and views, and manipulate data in a SQL

More information

Efficiently Join a SAS Data Set with External Database Tables

Efficiently Join a SAS Data Set with External Database Tables ABSTRACT Paper 2466-2018 Efficiently Join a SAS Data Set with External Database Tables Dadong Li, Michael Cantor, New York University Medical Center Joining a SAS data set with an external database is

More information

ABSTRACT INTRODUCTION MACRO. Paper RF

ABSTRACT INTRODUCTION MACRO. Paper RF Paper RF-08-2014 Burst Reporting With the Help of PROC SQL Dan Sturgeon, Priority Health, Grand Rapids, Michigan Erica Goodrich, Priority Health, Grand Rapids, Michigan ABSTRACT Many SAS programmers need

More information

Ten Great Reasons to Learn SAS Software's SQL Procedure

Ten Great Reasons to Learn SAS Software's SQL Procedure Ten Great Reasons to Learn SAS Software's SQL Procedure Kirk Paul Lafler, Software Intelligence Corporation ABSTRACT The SQL Procedure has so many great features for both end-users and programmers. It's

More information

Arthur L. Carpenter California Occidental Consultants, Oceanside, California

Arthur L. Carpenter California Occidental Consultants, Oceanside, California Paper 028-30 Storing and Using a List of Values in a Macro Variable Arthur L. Carpenter California Occidental Consultants, Oceanside, California ABSTRACT When using the macro language it is not at all

More information

Choosing the Right Technique to Merge Large Data Sets Efficiently Qingfeng Liang, Community Care Behavioral Health Organization, Pittsburgh, PA

Choosing the Right Technique to Merge Large Data Sets Efficiently Qingfeng Liang, Community Care Behavioral Health Organization, Pittsburgh, PA Choosing the Right Technique to Merge Large Data Sets Efficiently Qingfeng Liang, Community Care Behavioral Health Organization, Pittsburgh, PA ABSTRACT This paper outlines different SAS merging techniques

More information

Let Hash SUMINC Count For You Joseph Hinson, Accenture Life Sciences, Berwyn, PA, USA

Let Hash SUMINC Count For You Joseph Hinson, Accenture Life Sciences, Berwyn, PA, USA ABSTRACT PharmaSUG 2014 - Paper CC02 Let Hash SUMINC Count For You Joseph Hinson, Accenture Life Sciences, Berwyn, PA, USA Counting of events is inevitable in clinical programming and is easily accomplished

More information

How to Incorporate Old SAS Data into a New DATA Step, or What is S-M-U?

How to Incorporate Old SAS Data into a New DATA Step, or What is S-M-U? How to Incorporate Old SAS Data into a New DATA Step, or What is S-M-U? Andrew T. Kuligowski Nielsen Media Research Abstract / Introduction S-M-U. Some people will see these three letters and immediately

More information

SAS System Powers Web Measurement Solution at U S WEST

SAS System Powers Web Measurement Solution at U S WEST SAS System Powers Web Measurement Solution at U S WEST Bob Romero, U S WEST Communications, Technical Expert - SAS and Data Analysis Dale Hamilton, U S WEST Communications, Capacity Provisioning Process

More information

Contents of SAS Programming Techniques

Contents of SAS Programming Techniques Contents of SAS Programming Techniques Chapter 1 About SAS 1.1 Introduction 1.1.1 SAS modules 1.1.2 SAS module classification 1.1.3 SAS features 1.1.4 Three levels of SAS techniques 1.1.5 Chapter goal

More information

David Beam, Systems Seminar Consultants, Inc., Madison, WI

David Beam, Systems Seminar Consultants, Inc., Madison, WI Paper 150-26 INTRODUCTION TO PROC SQL David Beam, Systems Seminar Consultants, Inc., Madison, WI ABSTRACT PROC SQL is a powerful Base SAS Procedure that combines the functionality of DATA and PROC steps

More information

USING PROC SQL EFFECTIVELY WITH SAS DATA SETS JIM DEFOOR LOCKHEED FORT WORTH COMPANY

USING PROC SQL EFFECTIVELY WITH SAS DATA SETS JIM DEFOOR LOCKHEED FORT WORTH COMPANY USING PROC SQL EFFECTIVELY WITH SAS DATA SETS JIM DEFOOR LOCKHEED FORT WORTH COMPANY INTRODUCTION This paper is a beginning tutorial on reading and reporting Indexed SAS Data Sets with PROC SQL. Its examples

More information

Simplifying Your %DO Loop with CALL EXECUTE Arthur Li, City of Hope National Medical Center, Duarte, CA

Simplifying Your %DO Loop with CALL EXECUTE Arthur Li, City of Hope National Medical Center, Duarte, CA PharmaSUG 2017 BB07 Simplifying Your %DO Loop with CALL EXECUTE Arthur Li, City of Hope National Medical Center, Duarte, CA ABSTRACT One often uses an iterative %DO loop to execute a section of a macro

More information

SAS seminar. The little SAS book Chapters 3 & 4. April 15, Åsa Klint. By LD Delwiche and SJ Slaughter. 3.1 Creating and Redefining variables

SAS seminar. The little SAS book Chapters 3 & 4. April 15, Åsa Klint. By LD Delwiche and SJ Slaughter. 3.1 Creating and Redefining variables SAS seminar April 15, 2003 Åsa Klint The little SAS book Chapters 3 & 4 By LD Delwiche and SJ Slaughter Data step - read and modify data - create a new dataset - performs actions on rows Proc step - use

More information

SAS Macro Dynamics: from Simple Basics to Powerful Invocations Rick Andrews, Office of Research, Development, and Information, Baltimore, MD

SAS Macro Dynamics: from Simple Basics to Powerful Invocations Rick Andrews, Office of Research, Development, and Information, Baltimore, MD ABSTRACT CODERS CORNER SAS Macro Dynamics: from Simple Basics to Powerful Invocations Rick Andrews, Office of Research, Development, and Information, Baltimore, MD The SAS Macro Facility offers a mechanism

More information

DSCI 325: Handout 10 Summarizing Numerical and Categorical Data in SAS Spring 2017

DSCI 325: Handout 10 Summarizing Numerical and Categorical Data in SAS Spring 2017 DSCI 325: Handout 10 Summarizing Numerical and Categorical Data in SAS Spring 2017 USING PROC MEANS The routine PROC MEANS can be used to obtain limited summaries for numerical variables (e.g., the mean,

More information

Keh-Dong Shiang, Department of Biostatistics & Department of Diabetes, City of Hope National Medical Center, Duarte, CA

Keh-Dong Shiang, Department of Biostatistics & Department of Diabetes, City of Hope National Medical Center, Duarte, CA Validating Data Via PROC SQL Keh-Dong Shiang, Department of Biostatistics & Department of Diabetes, City of Hope National Medical Center, Duarte, CA ABSTRACT The Structured Query Language (SQL) is a standardized

More information

Get Going with PROC SQL Richard Severino, Convergence CT, Honolulu, HI

Get Going with PROC SQL Richard Severino, Convergence CT, Honolulu, HI Get Going with PROC SQL Richard Severino, Convergence CT, Honolulu, HI ABSTRACT PROC SQL is the SAS System s implementation of Structured Query Language (SQL). PROC SQL can be used to retrieve or combine/merge

More information

1. Join with PROC SQL a left join that will retain target records having no lookup match. 2. Data Step Merge of the target and lookup files.

1. Join with PROC SQL a left join that will retain target records having no lookup match. 2. Data Step Merge of the target and lookup files. Abstract PaperA03-2007 Table Lookups...You Want Performance? Rob Rohrbough, Rohrbough Systems Design, Inc. Presented to the Midwest SAS Users Group Monday, October 29, 2007 Paper Number A3 Over the years

More information