SAS Linear Model Demo. Overview

Similar documents
Chapter 6: Modifying and Combining Data Sets

Keeping Track of Database Changes During Database Lock

How to Keep Multiple Formats in One Variable after Transpose Mindy Wang

A Cross-national Comparison Using Stacked Data

Mapping Clinical Data to a Standard Structure: A Table Driven Approach

SD10 A SAS MACRO FOR PERFORMING BACKWARD SELECTION IN PROC SURVEYREG

A SAS Macro Utility to Modify and Validate RTF Outputs for Regional Analyses Jagan Mohan Achi, PPD, Austin, TX Joshua N. Winters, PPD, Rochester, NY

Paper CC06. a seed number for the random number generator, a prime number is recommended

Importing Excel into SAS: A Robust Approach for Difficult-To-Read Worksheets

Paper An Automated Reporting Macro to Create Cell Index An Enhanced Revisit. Shi-Tao Yeh, GlaxoSmithKline, King of Prussia, PA

There s No Such Thing as Normal Clinical Trials Data, or Is There? Daphne Ewing, Octagon Research Solutions, Inc., Wayne, PA

Automate Clinical Trial Data Issue Checking and Tracking

Don t Forget About SMALL Data

From Manual to Automatic with Overdrive - Using SAS to Automate Report Generation Faron Kincheloe, Baylor University, Waco, TX

Using Templates Created by the SAS/STAT Procedures

PASS4TEST. IT Certification Guaranteed, The Easy Way! We offer free update service for one year

Statistics and Data Analysis. Common Pitfalls in SAS Statistical Analysis Macros in a Mass Production Environment

PharmaSUG Paper AD06

SAS (Statistical Analysis Software/System)

Want to Do a Better Job? - Select Appropriate Statistical Analysis in Healthcare Research

Facilitate Statistical Analysis with Automatic Collapsing of Small Size Strata

Using SAS/SCL to Create Flexible Programs... A Super-Sized Macro Ellen Michaliszyn, College of American Pathologists, Northfield, IL

Using SAS Macro to Include Statistics Output in Clinical Trial Summary Table

Clinical Data Visualization using TIBCO Spotfire and SAS

PhUSE US Connect 2018 Paper CT06 A Macro Tool to Find and/or Split Variable Text String Greater Than 200 Characters for Regulatory Submission Datasets

Top Coding Tips. Neil Merchant Technical Specialist - SAS

Greenspace: A Macro to Improve a SAS Data Set Footprint

SAS Online Training: Course contents: Agenda:

USING SAS SOFTWARE TO COMPARE STRINGS OF VOLSERS IN A JCL JOB AND A TSO CLIST

INTRODUCTION TO SAS HOW SAS WORKS READING RAW DATA INTO SAS

2 = Disagree 3 = Neutral 4 = Agree 5 = Strongly Agree. Disagree

186 Statistics, Data Analysis and Modeling. Proceedings of MWSUG '95

Paper DB2 table. For a simple read of a table, SQL and DATA step operate with similar efficiency.

Using SAS to Analyze CYP-C Data: Introduction to Procedures. Overview

Cover the Basics, Tool for structuring data checking with SAS Ole Zester, Novo Nordisk, Denmark

Contents of SAS Programming Techniques

Using GSUBMIT command to customize the interface in SAS Xin Wang, Fountain Medical Technology Co., ltd, Nanjing, China

Getting Classy: A SAS Macro for CLASS Statement Automation

Data Quality Review for Missing Values and Outliers

Data Preparation for Analytics

SAS (Statistical Analysis Software/System)

Using PROC SQL to Generate Shift Tables More Efficiently

AURA ACADEMY SAS TRAINING. Opposite Hanuman Temple, Srinivasa Nagar East, Ameerpet,Hyderabad Page 1

Validation Summary using SYSINFO

A Simple Framework for Sequentially Processing Hierarchical Data Sets for Large Surveys

CREATING A SUMMARY TABLE OF NORMALIZED (Z) SCORES

Report Writing, SAS/GRAPH Creation, and Output Verification using SAS/ASSIST Matthew J. Becker, ST TPROBE, inc., Ann Arbor, MI

A SAS Macro for Producing Benchmarks for Interpreting School Effect Sizes

PharmaSUG Paper TT11

Implementing external file processing with no record delimiter via a metadata-driven approach

Paper S Data Presentation 101: An Analyst s Perspective

Creating output datasets using SQL (Structured Query Language) only Andrii Stakhniv, Experis Clinical, Ukraine

Common Sense Validation Using SAS

Tracking Dataset Dependencies in Clinical Trials Reporting

Open Problem for SUAVe User Group Meeting, November 26, 2013 (UVic)

A Practical and Efficient Approach in Generating AE (Adverse Events) Tables within a Clinical Study Environment

INTRODUCTION TO PROC SQL JEFF SIMPSON SYSTEMS ENGINEER

Data Manipulations Using Arrays and DO Loops Patricia Hall and Jennifer Waller, Medical College of Georgia, Augusta, GA

Missing Pages Report. David Gray, PPD, Austin, TX Zhuo Chen, PPD, Austin, TX

Advanced Visualization using TIBCO Spotfire and SAS

A Tool to Compare Different Data Transfers Jun Wang, FMD K&L, Inc., Nanjing, China

work.test temp.test sasuser.test test

%Addval: A SAS Macro Which Completes the Cartesian Product of Dataset Observations for All Values of a Selected Set of Variables

A Macro that can Search and Replace String in your SAS Programs

SAS CLINICAL SYLLABUS. DURATION: - 60 Hours

The Dataset Diet How to transform short and fat into long and thin

Macros for Two-Sample Hypothesis Tests Jinson J. Erinjeri, D.K. Shifflet and Associates Ltd., McLean, VA

Base and Advance SAS

SAS Training BASE SAS CONCEPTS BASE SAS:

Make the Most Out of Your Data Set Specification Thea Arianna Valerio, PPD, Manila, Philippines

DSCI 325: Handout 10 Summarizing Numerical and Categorical Data in SAS Spring 2017

Implementing CDISC Using SAS. Full book available for purchase here.

Macros to Manage your Macros? Garrett Weaver, University of Southern California, Los Angeles, CA

Please login. Procedures for Data Insight. overview. Take a seat at one of the work stations Login with your HawkID Locate SAS 9.3 in the Start Menu

An Efficient Tool for Clinical Data Check

Creating a Data Shell or an Initial Data Set with the DS2 Procedure

How to use UNIX commands in SAS code to read SAS logs

An Algorithm to Compute Exact Power of an Unordered RxC Contingency Table

ABSTRACT INTRODUCTION MACRO. Paper RF

STATION

1 Files to download. 3 Macro to list the highest and lowest N data values. 2 Reading in the example data file

Seminar Series: CTSI Presents

Creating Macro Calls using Proc Freq

%MAKE_IT_COUNT: An Example Macro for Dynamic Table Programming Britney Gilbert, Juniper Tree Consulting, Porter, Oklahoma

Figure 1. Table shell

Please login. Take a seat Login with your HawkID Locate SAS 9.3. Raise your hand if you need assistance. Start / All Programs / SAS / SAS 9.

TRANSFORMING MULTIPLE-RECORD DATA INTO SINGLE-RECORD FORMAT WHEN NUMBER OF VARIABLES IS LARGE.

Ditch the Data Memo: Using Macro Variables and Outer Union Corresponding in PROC SQL to Create Data Set Summary Tables Andrea Shane MDRC, Oakland, CA

Assignment 5.5. Nothing here to hand in

A SAS MACRO for estimating bootstrapped confidence intervals in dyadic regression

T.I.P.S. (Techniques and Information for Programming in SAS )

%MISSING: A SAS Macro to Report Missing Value Percentages for a Multi-Year Multi-File Information System

22S:172. Duplicates. may need to check for either duplicate ID codes or duplicate observations duplicate observations should just be eliminated

Techdata Solution. SAS Analytics (Clinical/Finance/Banking)

Quality Control in SAS : Checking Input, Work, and Output

ABSTRACT INTRODUCTION WORK FLOW AND PROGRAM SETUP

SAS Macros for Grouping Count and Its Application to Enhance Your Reports

ssh tap sas913 sas

Cleaning Duplicate Observations on a Chessboard of Missing Values Mayrita Vitvitska, ClinOps, LLC, San Francisco, CA

PharmaSUG China. Systematically Reordering Axis Major Tick Values in SAS Graph Brian Shen, PPDI, ShangHai

Transcription:

SAS Linear Model Demo Yupeng Wang, Ph.D, Data Scientist Overview SAS is a popular programming tool for biostatistics and clinical trials data analysis. Here I show an example of using SAS linear regression models to detect weight change-associated genes from a microarray dataset with gender and race factors corrected. Microarray data contain expression profiles of tens of thousands of genes in different samples or under multiple conditions, so the dataset of this demo is big. My SAS program shows a general procedure to deal with big datasets. The source code can be downloaded from https://github.com/wyp1125/sas-biostatistics- Toolset/blob/master/microarray_glm.sas. Key techniques: SAS, macro, macro variable, importing data, proc format, proc contents, proc sql, proc transpose, merging datasets, do loop, p-value, multiple testing correction, FDR 1. Raw data Detailed procedure A small part of the raw data is available at https://github.com/wyp1125/sas-biostatistics- Toolset/blob/master/exp_data.xlsx. At the top are the metadata rows, starting with!. As we want to build linear regression models of weight change on gene expression value, gender and race, we need to extract weight change, gender and race values from the metadata. The row starting with ID_REF contains sample IDs, while subsequent rows are expression values, with the first column containing gene (probe_set) IDs. 1

2. Importing data proc import datafile='/folders/myfolders/gene/exp_data.xlsx' out=temp dbms=xlsx replace; sheet=sheet2; getnames=no; Because column names are not imported, A-AA are automatically assigned to them. 3. Processing raw data to generate weight change, gender, race and gene expression datasets We first define some formats to facilitate conversion of gender and race values: proc format; value $ frace 'African American'=1 'Caucasian'=2; value $ gd 'M'=1 'F'=2; We divide raw data into three datasets: sam_ch for metadata, sam_id for sample IDs, and exprs for gene expression: data exprs sam_ch sam_id; set temp; if anydigit(substr(a,1,1),1)>0 then do; rename A=id; output exprs; else if find(a,"characteristics")>0 then output sam_ch; else if A='ID_REF' then output sam_id; Retrieve column names from exprs and save them in dataset vars, and then create new column names to facilitate data type conversion: proc contents data=exprs out=vars(keep=name type); data vars; set vars; if type=2 and name ne 'id'; newname=trim(left(name)) "_n"; 2

Assign old and new column names to macro variables: proc contents data=exprs out=vars(keep=name type); data vars; set vars; if type=2 and name ne 'id'; newname=trim(left(name)) "_n"; Convert character data type to numeric data type for weight change and save the weight change data as the dataset pheno: data pheno; set sam_ch; where B contains 'weight'; id="weight"; array nu(*) &n_list; do it=1 to dim(ch); nu(it)=input(substr(ch(it),24),percent.); drop it &c_list; rename &renam_list; drop it &c_list; rename &renam_list; Generate the dataset gender: data gender; set sam_ch; where B contains 'gender'; id='gender'; do it=1 to dim(ch); ch(it)=trim(left(substr(ch(it),9,1))); format &c_list gd.; drop it; 3

Generate the dataset race: data race; set sam_ch; where B contains 'race'; id="race"; do it=1 to dim(ch); ch(it)=trim(left(substr(ch(it),6))); format &c_list frace.; drop it; Convert character data type to numeric data type for gene expression values and save gene expression data as the dataset exprs1: data exprs1; set exprs; array nu(*) &n_list; do it= 1 to dim(ch); nu(it)=input(ch(it),8.); drop it &c_list; rename &renam_list; run 4. Merging weight change, gender, race and gene expression datasets into one dataset for statistical analysis Transpose individual datasets: proc transpose data=pheno out=pheno1 (drop=_label_); proc transpose data=gender out=gender1 (drop=_label_); proc transpose data=race out=race1 (drop=_label_); proc transpose data=exprs1 out=exprs2 (drop=_label_); 4

Sort individual datasets: proc sort data=pheno1; proc sort data=gender1; proc sort data=race1; proc sort data=exprs2; Merge individual datasets into one dataset stat for statistical analysis: data stat; merge pheno1 gender1 race1 exprs2; In dataset stat, rows represent samples, while columns represent weight change, gender, race and genes. 5. Deriving some macro variables to facilitate statistical analysis Output the variables of the stat dataset: proc contents data=stat out=stat_var (keep=name) noprint; Compute the total number of genes and assign it to a macro variable nprob: data _NULL_; dsid = open("stat_var"); obss = attrn(dsid,"nlobs")-4; call symput('nprob',compress(put(obss,5.))); 5

Retrieve all gene names and assign them to macro variables: proc sql noprint; select name into: pid1- :pid&nprob from stat_var where name not in ('_name_','gender','race','weight'); 6. Statistical analysis: detecting weight change-associated genes Make a macro comput to 1) loop through each gene (columns of stat) to build a linear regression model of weight change on gene expression value, race and gender, and retrieve the p-value for the coefficient of gene expression value. 2) save the names and p-values of all genes as a dataset allp. %macro comput; %do it=1 %to &nprob; proc glm data=stat outstat=ttt noprint; class gender race; model weight=race gender &&pid⁢ data _NULL_; ln=4; set ttt point=ln; call symputx("pva" compress(put(&it,5.)),prob); stop; % data allp; %do ii=1 %to &nprob; gene="&&pid&ii"; pvalue="&&pva&ii"; output; % %m %comput; Process the allp dataset for multiple testing correction: data allp1; set allp; raw_p=input(pvalue,best.); drop pvalue; 6

Correct p-values by the FDR approach: proc multtest inpvalues=allp1 FDR out=allp_adj; Print the genes associated with weight change according to FDR-adjusted p-value<0.1: proc print data=allp_adj; where fdr_p<0.1; 7