Getting Classy: A SAS Macro for CLASS Statement Automation

Similar documents
Interpreting Black-Box Machine Learning Models Using Partial Dependence and Individual Conditional Expectation Plots

Getting it Done with PROC TABULATE

ABSTRACT INTRODUCTION MACRO. Paper RF

Automatic Indicators for Dummies: A macro for generating dummy indicators from category type variables

SD10 A SAS MACRO FOR PERFORMING BACKWARD SELECTION IN PROC SURVEYREG

Using Templates Created by the SAS/STAT Procedures

/********************************************/ /* Evaluating the PS distribution!!! */ /********************************************/

The Dataset Diet How to transform short and fat into long and thin

To conceptualize the process, the table below shows the highly correlated covariates in descending order of their R statistic.

Preserving your SAS Environment in a Non-Persistent World. A Detailed Guide to PROC PRESENV. Steven Gross, Wells Fargo, Irving, TX

Using PROC SQL to Generate Shift Tables More Efficiently

The REPORT Procedure: A Primer for the Compute Block

PharmaSUG China. Systematically Reordering Axis Major Tick Values in SAS Graph Brian Shen, PPDI, ShangHai

Creating Macro Calls using Proc Freq

Cleaning Duplicate Observations on a Chessboard of Missing Values Mayrita Vitvitska, ClinOps, LLC, San Francisco, CA

A Practical and Efficient Approach in Generating AE (Adverse Events) Tables within a Clinical Study Environment

Using SAS/SCL to Create Flexible Programs... A Super-Sized Macro Ellen Michaliszyn, College of American Pathologists, Northfield, IL

Figure 1: The PMG GUI on startup

Macros are a block of code that can be executed/called on demand

Unlock SAS Code Automation with the Power of Macros

Avoiding Macros in SAS. Fareeza Khurshed Yiye Zeng

A Format to Make the _TYPE_ Field of PROC MEANS Easier to Interpret Matt Pettis, Thomson West, Eagan, MN

%MAKE_IT_COUNT: An Example Macro for Dynamic Table Programming Britney Gilbert, Juniper Tree Consulting, Porter, Oklahoma

Outlier Detection Using the Forward Search in SAS/IML Studio

Data Grids in Business Rules, Decisions, Batch Scoring, and Real-Time Scoring

Tweaking your tables: Suppressing superfluous subtotals in PROC TABULATE

Data Science Essentials

Automatically Configure SDTM Specifications Using SAS and VBA

A SAS Macro for Producing Benchmarks for Interpreting School Effect Sizes

Paper This paper reviews these techniques and demonstrates them through a series of examples.

A Quick and Gentle Introduction to PROC SQL

Using Python With SAS Cloud Analytic Services (CAS)

2. Smoothing Binning

Want to Do a Better Job? - Select Appropriate Statistical Analysis in Healthcare Research

Checking for Duplicates Wendi L. Wright

Paper Appendix 4 contains an example of a summary table printed from the dataset, sumary.

Submitting SAS Code On The Side

Hooking up SAS and Excel. Colin Harris Technical Director

Using SAS to Analyze CYP-C Data: Introduction to Procedures. Overview

Efficiency Programming with Macro Variable Arrays

Missing Pages Report. David Gray, PPD, Austin, TX Zhuo Chen, PPD, Austin, TX

One Project, Two Teams: The Unblind Leading the Blind

SAS Linear Model Demo. Overview

An Animated Guide: Penalized variable selection techniques in SAS and Quantile Regression

PharmaSUG Paper SP07

Validation Summary using SYSINFO

Greenspace: A Macro to Improve a SAS Data Set Footprint

Going Under the Hood: How Does the Macro Processor Really Work?

Data Preparation for Analytics

An Easy Route to a Missing Data Report with ODS+PROC FREQ+A Data Step Mike Zdeb, FSL, University at Albany School of Public Health, Rensselaer, NY

PharmaSUG Paper AD09

Data Extracts for everybody A SAS Customer Intelligence Studio implementation

Automated Checking Of Multiple Files Kathyayini Tappeta, Percept Pharma Services, Bridgewater, NJ

Paper HOW-06. Tricia Aanderud, And Data Inc, Raleigh, NC

Make the Most Out of Your Data Set Specification Thea Arianna Valerio, PPD, Manila, Philippines

Tracking Dataset Dependencies in Clinical Trials Reporting

Out of Control! A SAS Macro to Recalculate QC Statistics

A Macro to Keep Titles and Footnotes in One Place

WEB MATERIAL. eappendix 1: SAS code for simulation

Getting Started with the SGPLOT Procedure

A Cross-national Comparison Using Stacked Data

A Macro for Systematic Treatment of Special Values in Weight of Evidence Variable Transformation Chaoxian Cai, Automated Financial Systems, Exton, PA

Keeping Track of Database Changes During Database Lock

CC13 An Automatic Process to Compare Files. Simon Lin, Merck & Co., Inc., Rahway, NJ Huei-Ling Chen, Merck & Co., Inc., Rahway, NJ

Scheme in Scheme: The Metacircular Evaluator Eval and Apply

PASS4TEST. IT Certification Guaranteed, The Easy Way! We offer free update service for one year

= %sysfunc(dequote(&ds_in)); = %sysfunc(dequote(&var));

Automating Preliminary Data Cleaning in SAS

Format-o-matic: Using Formats To Merge Data From Multiple Sources

Smoking and Missingness: Computer Syntax 1

Useful Tips When Deploying SAS Code in a Production Environment

How Do I.? Some Beginners FAQs. Peter Eberhardt, Fernwood Consulting Group Inc. Toronto, Canada Audrey Yeo, Athene USA, West Des Moines, IA

Dictionary.coumns is your friend while appending or moving data

Paper An Automated Reporting Macro to Create Cell Index An Enhanced Revisit. Shi-Tao Yeh, GlaxoSmithKline, King of Prussia, PA

Reducing Credit Union Member Attrition with Predictive Analytics

PhUSE US Connect 2018 Paper CT06 A Macro Tool to Find and/or Split Variable Text String Greater Than 200 Characters for Regulatory Submission Datasets

%MISSING: A SAS Macro to Report Missing Value Percentages for a Multi-Year Multi-File Information System

Common Sense Validation Using SAS

Business Club. Decision Trees

Purchase this book at

Tags, Categories and Keywords

Introduction to the SAS System

Conversion of CDISC specifications to CDISC data specifications driven SAS programming for CDISC data mapping

Creating a Data Shell or an Initial Data Set with the DS2 Procedure

Mapping Clinical Data to a Standard Structure: A Table Driven Approach

An Efficient Tool for Clinical Data Check

SAS STUDIO. JUNE 2014 PRESENTER: MARY HARDING Education SAS Canada. Copyr i g ht 2014, SAS Ins titut e Inc. All rights res er ve d.

SAS Online Training: Course contents: Agenda:

Simulation of Imputation Effects Under Different Assumptions. Danny Rithy

Arthur L. Carpenter California Occidental Consultants, Oceanside, California

Identifying Duplicate Variables in a SAS Data Set

PharmaSUG 2013 CC26 Automating the Labeling of X- Axis Sanjiv Ramalingam, Vertex Pharmaceuticals, Inc., Cambridge, MA

How to use FSBforecast Excel add in for regression analysis

Ranking Between the Lines

SAS Programming Techniques for Manipulating Metadata on the Database Level Chris Speck, PAREXEL International, Durham, NC

Data Quality Review for Missing Values and Outliers

SAS Macro Technique for Embedding and Using Metadata in Web Pages. DataCeutics, Inc., Pottstown, PA

Useful Tips from my Favorite SAS Resources

Poster Frequencies of a Multiple Mention Question

A SAS Macro Utility to Modify and Validate RTF Outputs for Regional Analyses Jagan Mohan Achi, PPD, Austin, TX Joshua N. Winters, PPD, Rochester, NY

Transcription:

Getting Classy: A SAS Macro for CLASS Statement Automation

ABSTRACT When creating statistical models that include multiple covariates, it is important to address which variables are considered categorical and continuous for proper analysis and interpretation in SAS. Example of a Statistical Model using a CLASS statement Below is an example of code which could be used for a Logistic Regression. While the model has multiple covariates included, only two of them are listed in the class statement. Categorical variables, regardless of SAS data type, should be added in the model statement along with an additional class statement. For larger models which contain many continuous or categorical variables, it is easy to overlook variables that need to be added to the class statement. To solve this issue we have created the %MASTERCLASS macro. This macro uses simple inputs including a model variable list and a dataset PROC LOGISTIC DATA=CARS DESCENDING; CLASS DRIVETRAIN(REF='All') ORIGIN(REF='Asia')/PARAM=REF; MODEL UNDER25K = ORIGIN MPG_CITY WEIGHT DRIVETRAIN ENGINESIZE HORSEPOWER MPG_HIGHWAY WEIGHT WHEELBASE LENGTH; This is a simple process when you are familiar with your dataset or have a small list of covariates considered in a model. to create automatically generated class and model Variable listings for modeling. This becomes much more work when multiple models, or large numbers with small changes inside each model, or models with an extremely large number of covariates. Our macro helps improve efficiency and quality of class statement creation. Graphic from clipartfox.com

EXAMPLE DATASET: SASHELP.CARS The example used for this macro will be shown with the SASHELP.CARS dataset with a small modification. Variable names, types, and formats are listed below. Here s a bit of background on the CARS dataset: Includes 16 Variables: 5 character types 10 numeric type s 1 numeric with a small number of levels 1 created binary variable (Created based on MSRP </ $25,000) Mock examples show modeling for MSRP or costs above or below 25k MSRP. INPUTS AND FINAL OUTPUTS All you need is the dataset of interest, the %MasterCLASS macro, the model of interest covariates. From there, the macro is simple to use with the example prompt listed below: Input: (Code) %MasterCLASS(DATA = <dataset>, VARS = <variables>, LEVELS = <number of levels>); (Example) %MasterCLASS(DATA=SASHELP.CARS, VARS=ORIGIN MAKE TYPE MPG_CITY WEIGHT Cylinders DriveTrain EngineSize Horsepower mpg_highway weight wheelbase length); Output: Three macro variables: &CLASS. &VARIABLE. &MODEL. %PUT &MODEL.; Cylinders DriveTrain Make Origin Type MPG_City Weight EngineSize Horsepower MPG_Highway Weight Wheelbase Length % PUT &CLASS.; Cylinders(REF='3') DriveTrain(REF='All') MAKE(REF='Acura') Origin (REF='Asia') Type(REF='Hybrid') % PUT &VARIABLE.; Cylinders DriveTrain Make Origin Type Graphics from clipartfox.com

Macro information and code available by phone! Very little of the details are needed to work this macro! However, behind the scenes this is what s going on: User identifies dataset and variables of interest Metadata is gathered from PROC CONTENTS The metadata variables are separated by numeric and character types Logic checks for variables assigned as numeric which should be identified as categorical types to be included in CLASS statement e.g. 0/1 binary variables or the Cylinders variable within SASHELP.CARS Unique levels are obtained for all variables in the class statement %MacroCLASS code can be found with the QR code above or at https://tinyurl.com/h6mcd45 Macro variables are created for class and model statement. Auto generated references groups (i.e. Type(REF='Hybrid )) can be auto-included for the CLASS statement End user just needs to know macro prompts!

%MasterCLASS code: everything needed for an automated class statement! This code is also available in the published paper and online at the QR code posted on the previous slide. %MACRO MasterCLASS(DATA=, VARS=, LEVELS=10); /*Grabs metadata from the dataset of interest*/ PROC CONTENTS DATA = &DATA out = OUTCONTENT noprint; /*Creates empty datasets for later use*/ DATA _VARS; LENGTH NAME $255.; DATA CAT; LENGTH NAME $255.; DATA CONT; LENGTH NAME $255.; /*Count the number of variables of interest*/ %LET CNT1 = %SYSFUNC(countw(&VARS,' ')); /*Setting up the do loop through the variable list*/ %DO I = 1 %TO &CNT1; /*This will scan through the list of &VARS. one at a time separated by a space, until it reaches the end.*/ %LET VAR = %SCAN(&VARS.,&I.,' '); /*Get the count of levels to determine type of numeric*/ PROC SQL; CREATE TABLE _A AS SELECT DISTINCT "&VAR" AS NAME, COUNT(DISTINCT &VAR) as LEVELS FROM &DATA; /*Create a table where variable information regarding levels is kept*/ DATA _VARS; SET _VARS _A; /*Combine level counts with data type for the categorical variables that have many levels*/ PROC SQL; CREATE TABLE _VARS1 AS SELECT A.NAME, A.LEVELS, B.TYPE FROM _VARS A, OUTCONTENT B WHERE UPCASE(A.NAME) = UPCASE(B.NAME); /*If the number of levels < than the minimum we defined OR the type is string then they are lumped into categorical*/ DATA CONT CAT; SET _VARS1; IF LEVELS < &LEVELS OR TYPE = 2 THEN OUTPUT CAT; ELSE OUTPUT CONT; INTO :CATROW ; INTO :CONTROW FROM CONT; /*Now the continuous variables are noted*/ %IF &CONTROW GE 1 %THEN %DO; SELECT NAME INTO :CONTINUOUS SEPARATED BY " " FROM CONT; %IF &CONTROW GE 1 %THEN %DO; /*Combine two categorical types: those that are numeric and those that are not.*/ SELECT NAME INTO :CATEGORICAL1 SEPARATED BY " " WHERE TYPE = 1; SELECT NAME INTO :CATEGORICAL2 SEPARATED BY " " WHERE TYPE = 2; /*Gather counts for both of those groups*/ INTO :CNT2 WHERE TYPE = 1; INTO :CNT3 WHERE TYPE = 2; /*First go through the string categorical variables*/ %IF &CNT3 GE 1 %THEN %DO; DATA CAT2; /*LENGTH LEVEL $255.;*/ SET &DATA; ARRAY CATVAR[&CNT3] &CATEGORICAL2; DO J = 1 TO &CNT3; VARIABLE = SCAN("&CATEGORICAL2",J,' '); LEVEL = CATVAR[j]; OUTPUT; END; KEEP VARIABLE LEVEL; %ELSE %DO; DATA CAT2; %IF &CNT2 GE 1 %THEN %DO; /*Now with numeric variables, since type must match since the arrays cannot handle different types*/ DATA CAT3; SET &DATA; ARRAY CATVAR[&CNT2] &CATEGORICAL1; DO J = 1 TO &CNT2; VARIABLE = SCAN("&CATEGORICAL1",J,' '); LEVEL = COMPRESS(INPUT(CATVAR[j],$255.)); LEVRAW = CATVAR[j]; OUTPUT; END; KEEP VARIABLE LEVEL LEVRAW; %ELSE %DO; DATA CAT3; DATA CATCOMB; LENGTH LEVEL $255.; SET CAT2 CAT3; IF LEVRAW =. THEN LEVRAW = 0; /*Combines the results and defining the variable references*/ CREATE TABLE CAT_ALL AS SELECT DISTINCT VARIABLE, LEVEL, LEVRAW, COMPRESS(VARIABLE "(REF='" LEVEL "')") AS CLASS COMB WHERE LEVEL NE '' GROUP BY VARIABLE ORDER BY 1,3,2; DATA CAT_ALL; SET CAT_ALL; BY VARIABLE; IF FIRST.VARIABLE NE 1 THEN DELETE; %GLOBAL MODEL CLASS VARIABLE; /*These are the macro variables we will use in our model*/ SELECT VARIABLE, CLASS INTO :VARIABLE SEPARATED BY ' ', :CLASS SEPARATED BY ' ' _ALL; %LET MODEL = &VARIABLE &CONTINUOUS; /*Housekeeping: Delete all tables except the working table we are using*/ PROC DATASETS noprint; DELETE CAT CAT2 CAT3 CAT_ALL _VARS _VARS1 CONT OUTCONTENT _A; /*Housekeeping: Clear macro variables used during run in case of re-run*/ %LET CATROW =; %LET CONTROW =; %LET CONTINUOUS =; %LET CATEGORICAL1 =; %LET CATEGORICAL2 =; %LET CNT2=; %LET CNT3=; %MEND;