SAS2VBA2SAS: Automated solution to string truncation in PROC IMPORT Amarnath Vijayarangan, Genpact, India

Similar documents
Importing CSV Data to All Character Variables Arthur L. Carpenter California Occidental Consultants, Anchorage, AK

3N Validation to Validate PROC COMPARE Output

HAVE YOU EVER WISHED THAT YOU DO NOT NEED TO TYPE OR CHANGE REPORT NUMBERS AND TITLES IN YOUR SAS PROGRAMS?

CC13 An Automatic Process to Compare Files. Simon Lin, Merck & Co., Inc., Rahway, NJ Huei-Ling Chen, Merck & Co., Inc., Rahway, NJ

SDTM Attribute Checking Tool Ellen Xiao, Merck & Co., Inc., Rahway, NJ

Taming a Spreadsheet Importation Monster

PROBLEM FORMULATION, PROPOSED METHOD AND DETAILED DESCRIPTION

Chasing the log file while running the SAS program

Accessing Password Protected Microsoft Excel Files in A SAS Grid Environment

DSCI 325: Handout 2 Getting Data into SAS Spring 2017

Automated Checking Of Multiple Files Kathyayini Tappeta, Percept Pharma Services, Bridgewater, NJ

Virtual Accessing of a SAS Data Set Using OPEN, FETCH, and CLOSE Functions with %SYSFUNC and %DO Loops

It s not the Yellow Brick Road but the SAS PC FILES SERVER will take you Down the LIBNAME PATH= to Using the 64-Bit Excel Workbooks.

Sending SAS Data Sets and Output to Microsoft Excel

Using SAS to Control the Post Processing of Microsoft Documents Nat Wooding, J. Sargeant Reynolds Community College, Richmond, VA

Bryan K. Beverly, UTA/DigitalNet

Building Sequential Programs for a Routine Task with Five SAS Techniques

Using Dynamic Data Exchange

One SAS To Rule Them All

Paper CC16. William E Benjamin Jr, Owl Computer Consultancy LLC, Phoenix, AZ

Journey to the center of the earth Deep understanding of SAS language processing mechanism Di Chen, SAS Beijing R&D, Beijing, China

Program Validation: Logging the Log

The Power of PROC SQL Techniques and SAS Dictionary Tables in Handling Data

Application Interface for executing a batch of SAS Programs and Checking Logs Sneha Sarmukadam, inventiv Health Clinical, Pune, India

Reading in Data Directly from Microsoft Word Questionnaire Forms

Moving Data and Results Between SAS and Excel. Harry Droogendyk Stratia Consulting Inc.

A SAS Macro Utility to Modify and Validate RTF Outputs for Regional Analyses Jagan Mohan Achi, PPD, Austin, TX Joshua N. Winters, PPD, Rochester, NY

A Tool to Compare Different Data Transfers Jun Wang, FMD K&L, Inc., Nanjing, China

Using GSUBMIT command to customize the interface in SAS Xin Wang, Fountain Medical Technology Co., ltd, Nanjing, China

Choosing the Right Tool from Your SAS and Microsoft Excel Tool Belt

Top Coding Tips. Neil Merchant Technical Specialist - SAS

SAS Application to Automate a Comprehensive Review of DEFINE and All of its Components

Exchanging data between SAS and Microsoft Excel

The Output Bundle: A Solution for a Fully Documented Program Run

Create Metadata Documentation using ExcelXP

Paper HOW-06. Tricia Aanderud, And Data Inc, Raleigh, NC

SAS Visual Analytics Environment Stood Up? Check! Data Automatically Loaded and Refreshed? Not Quite

PharmaSUG Paper CC02

PharmaSUG Paper TT11

Give m Their Way They ll Love it! Sarita Prasad Bedge, Family Care Inc., Portland, Oregon

Supporting the CDISC Validation Life-Cycle with Microsoft Excel VBA

Download the files from you will use these files to finish the following exercises.

INTRODUCTION TO SAS HOW SAS WORKS READING RAW DATA INTO SAS

Extending the Scope of Custom Transformations

DOWNLOAD PDF VBA MACRO TO PRINT MULTIPLE EXCEL SHEETS TO ONE

Why SAS Programmers Should Learn Python Too

A Macro that can Search and Replace String in your SAS Programs

While You Were Sleeping, SAS Was Hard At Work Andrea Wainwright-Zimmerman, Capital One Financial, Inc., Richmond, VA

Utilizing the VNAME SAS function in restructuring data files

Using DDE with Microsoft Excel and SAS to Collect Data from Hundreds of Users

Quick and Efficient Way to Check the Transferred Data Divyaja Padamati, Eliassen Group Inc., North Carolina.

SAS Drug Development Program Portability

Dictionary.coumns is your friend while appending or moving data

A Picture Is Worth A Thousand Data Points - Increase Understanding by Mapping Data. Paul Ciarlariello, Sinclair Community College, Dayton, OH

Same Data Different Attributes: Cloning Issues with Data Sets Brian Varney, Experis Business Analytics, Portage, MI

Create a Format from a SAS Data Set Ruth Marisol Rivera, i3 Statprobe, Mexico City, Mexico

PharmaSUG 2013 CC26 Automating the Labeling of X- Axis Sanjiv Ramalingam, Vertex Pharmaceuticals, Inc., Cambridge, MA

Automatically Configure SDTM Specifications Using SAS and VBA

Macro Quoting: Which Function Should We Use? Pengfei Guo, MSD R&D (China) Co., Ltd., Shanghai, China

Excel Macro Runtime Error Code 1004 Saveas Of Object _workbook Failed

Data Edit-checks Integration using ODS Tagset Niraj J. Pandya, Element Technologies Inc., NJ Vinodh Paida, Impressive Systems Inc.

What Do You Mean My CSV Doesn t Match My SAS Dataset?

Write SAS Code to Generate Another SAS Program A Dynamic Way to Get Your Data into SAS

SAS Data Integration Studio Take Control with Conditional & Looping Transformations

A Mass Symphony: Directing the Program Logs, Lists, and Outputs

PharmaSUG China Paper 059

Regaining Some Control Over ODS RTF Pagination When Using Proc Report Gary E. Moore, Moore Computing Services, Inc., Little Rock, Arkansas

A Time Saver for All: A SAS Toolbox Philip Jou, Baylor University, Waco, TX

CDISC Variable Mapping and Control Terminology Implementation Made Easy

Improving Your Relationship with SAS Enterprise Guide Jennifer Bjurstrom, SAS Institute Inc.

Electricity Forecasting Full Circle

An Efficient Tool for Clinical Data Check

ET01. LIBNAME libref <engine-name> <physical-file-name> <libname-options>; <SAS Code> LIBNAME libref CLEAR;

SAS/ACCESS Interface to PC Files for SAS Viya 3.2: Reference

A Macro to Create Program Inventory for Analysis Data Reviewer s Guide Xianhua (Allen) Zeng, PAREXEL International, Shanghai, China

PharmaSUG Paper PO10

A Macro to Keep Titles and Footnotes in One Place

Objectives Reading SAS Data Sets and Creating Variables Reading a SAS Data Set Reading a SAS Data Set onboard ia.dfwlax FirstClass Economy

An Alternate Way to Create the Standard SDTM Domains

TLF Management Tools: SAS programs to help in managing large number of TLFs. Eduard Joseph Siquioco, PPD, Manila, Philippines

The Next Generation Smart Program Repository

Beginner Beware: Hidden Hazards in SAS Coding

Submitting SAS Code On The Side

Macro Method to use Google Maps and SAS to Geocode a Location by Name or Address

Implementing external file processing with no record delimiter via a metadata-driven approach

Developing Data-Driven SAS Programs Using Proc Contents

ABSTRACT INTRODUCTION TRICK 1: CHOOSE THE BEST METHOD TO CREATE MACRO VARIABLES

Exporting Variable Labels as Column Headers in Excel using SAS Chaitanya Chowdagam, MaxisIT Inc., Metuchen, NJ

SAS/ACCESS 9.2. Interface to PC Files Reference. SAS Documentation

Using SAS DDE, SAS Macro and Excel VBA Macros to Create Automated Graphs for Multiple MS Excel Workbooks

PharmaSUG Paper PO12

Moving Data and Results Between SAS and Microsoft Excel

Creating and Executing Stored Compiled DATA Step Programs

Writing Programs in SAS Data I/O in SAS

TS04. Running OpenCDISC from SAS. Mark Crangle

SAS Viya 3.1 FAQ for Processing UTF-8 Data

SAS Programming Techniques for Manipulating Metadata on the Database Level Chris Speck, PAREXEL International, Durham, NC

Simplifying the Sample Design Process with PROC PMENU

PharmaSUG China. Systematically Reordering Axis Major Tick Values in SAS Graph Brian Shen, PPDI, ShangHai

Document and Enhance Your SAS Code, Data Sets, and Catalogs with SAS Functions, Macros, and SAS Metadata. Louise S. Hadden. Abt Associates Inc.

Transcription:

PharmaSUG China 2014 - Paper PO08 SAS2VBA2SAS: Automated solution to string truncation in PROC IMPORT Amarnath Vijayarangan, Genpact, India ABSTRACT SAS PROC IMPORT is one of the most commonly and widely used approach to convert flat files that are native to other database engines into SAS datasets. Nevertheless, this procedure also has its own flaws. The most commonly encountered issue is that PROC IMPORT might not properly assign the length for character variables as it fixes the length of a particular character variable based on the initial few observations which leads to string truncation. The proposed approach delivers a solution that combines SAS and Excel VBA to resolve the problem by using the data step code generated in the log while importing flat files without any manual intervention. INTRODUCTION Use of flat files such as CSV and other text files are part of everyday task of a programmer. Nevertheless, these files cannot be used directly in SAS, one needs to convert them into a SAS dataset before processing and this task is tedious especially when the dataset has many character variables, each varying in length. Using PROC IMPORT either through the Import Wizard or by submitting a code is a quick fix for this, but this also comes with its own in built flaw. PROC IMPORT assigns the length for a character variable based on the first twenty observations that it encounters. The guessing row statement comes in handy to increase the number of rows that PROC IMPORT considers before assigning a length to the character variables. But, the use of guessing row is a universal statement for that dataset being processed and hence will be applicable for the entire variables in the dataset eventually leading to increase in time and disk space utilization. The current available approach is to write a manual data step with an infile statement or modifying the data step generated through the proc import, but this becomes tedious and time consuming task when one has too many variables. This paper suggests a solution taking into the current flaws in mind. The approach suggested here uses SAS along with Excel VBA to overcome these flaws. APPROACH Step 1 Uses D functions to identify the list of flat files that needs to be imported. The function used here access the path specified in the filepath parameter given by the user and makes a list of all the files that needs to accessed and hence imported.

Step2 imports all the files identified in step1 using D function. Once the import procedure is executed, this phase also ensures that the procedures s log is redirected into a master log file named Logimport.log facilitates the VBA code to access the log even before the program execution is completed. Step3 uses the Logimport.log (contains the log of the original PROC IMPORT) file generated in step 2 as an input to identify the optimal length for the character variables. In simple words the VBA code reads the flat files that is imported and finds out the maximum length for a character variable across observations. The VBA code combines the Logimport.log file (copies the actual data step program from the log file) along with optimal length found for a character variable across observations and creates a vba2sas program containing data steps with a modified format and informat attribute of the character variables. Then vba2sas program is executed through %include statement. Overall the code is executed in two stages. First, PROC IMPORT is run and the log of the IMPORT procedure is copied into a file named Logimport.log. Then the VBA code scans the flat files being imported and finds out the maximum length for each character variable across observation. The VBA code uses the data step part of the program in the log and modifies the FORMAT and the INFORMAT length with the new length found and then the data step is executed again. In this way we ensure that all the character variables are properly imported and we overcome the issue of string truncation. VBA SET UP Below are the specifications one needs to set before executing the program. 1. Create workbook with the name ReadLog.xlsm with the VBA macro provided in the appendix 2. Ensure the ReadLog.xlsm has the tab named Output 3. Save the file in the location specified in the filepath parameter SAS2VBA2SAS MACRO %let filepath=d:\; %macro sas2vba2sas; filename fpth "&filepath"; data FileList; length FileName $ 50; dval=dopen('fpth'); if dval> 0 then do; count=dnum(dval); do i= 1 to count; FileName=dread(dval,i); if upcase(scan(filename,2,'.'))='csv' then output; end; end; keep FileName; proc printto log="&filepath/logimport.log" new; libname datapath "&filepath"; %let crb=%sysfunc(open(filelist)); %let i=1; %do %while(not %sysfunc(fetch(&crb))); %let Fname=%sysfunc(getvarc(&crb, %sysfunc(varnum(&crb, FileName)))); proc import datafile= "&filepath\&fname" out=datapath.data&i dbms=csv replace;

getnames=yes; datarow=2; %let i=%eval(&i+1); %end; %let crb=%sysfunc(close(&crb)); proc printto; options noxwait noxsync; data _null_; rc=system('start EXCEL'); rc=sleep(5); filename cmds dde 'EXCEL SYSTEM'; data _null_; file cmds; put "[open(""&filepath\readlog.xlsm"")]"; put '[run("readlogcode")]'; put "[quit()]"; quit; data _null_; rc=sleep(5); %include "&filepath/vba2sas.sas"; %mend; %sas2vba2sas CONCLUSION The proposed approach does not restrict itself to a limited number of observations to specify the length of a particular character variable. In this way the solution overcomes the problem of string truncation associated with the proc import even after using guessing rows which again restrict only to the exact user specified number of observations. Thus, this approach ensures the quality of the data is not compromised. CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Name: Amarnath Vijayarangan Enterprise: Genpact Address: Block B Salarpuria Soft Zone Outer Ring Road, Bellandur City, State ZIP: Bangalore, Karnataka, 560103 E-mail: amarnath7@gmail.com SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. APPENDIX Public LogFolderName As String

Public Sub ReadLogCode() Dim LogFileName As String Dim FileNum As Integer, tline As String, lngindex As Long, lnglp As Long, splreadlog Dim wbksrc As Workbook, intlp As Integer Sheets("Output").Activate Sheets("output").Range("A1").Value = "=LEFT(CELL(""filename"",A1),FIND(""["",CELL(""filename"",A1),1)-1)" LogFileName = Sheets("output").Range("A1").Value & "LogImport.log" LogFolderName = Sheets("output").Range("A1").Value ActiveSheet.Cells.Clear lngindex = 1 FileNum = FreeFile Open LogFileName For Input Access Read Shared As #FileNum Do While Not EOF(FileNum) Line Input #FileNum, tline ActiveSheet.Range("A1").Value = ActiveSheet.Range("A1").Value & tline & Chr(10) Loop Close #FileNum If ActiveSheet.Range("A1").Value <> "" Then lngindex = 1 splreadlog = Split(ActiveSheet.Range("A1").Value, Chr(10)) For lnglp = 0 To WorksheetFunction.CountA(splReadLog) - 1 If (IsNumeric(Trim(Left(splReadLog(lngLp), 8))) = True) And InStr(1, Trim(splReadLog(lngLp)), "SAS") = 0 Then ActiveSheet.Range("B" & lngindex).value = Trim(Mid(splReadLog(lngLp), 6)) lngindex = lngindex + 1 ElseIf InStr(1, Trim(UCase(splReadLog(lngLp))), "LIBNAME") > 0 Then ActiveSheet.Range("B" & lngindex).value = Trim(Mid(splReadLog(lngLp), 6)) lngindex = lngindex + 1 ElseIf InStr(1, Trim(Left(splReadLog(lngLp), 8)), "!") > 0 And IsNumeric(Replace(Trim(Left(splReadLog(lngLp), 5)), "!", "")) = True Then ActiveSheet.Range("B" & lngindex).value = Trim(Mid(splReadLog(lngLp), 6)) lngindex = lngindex + 1 Debug.Print Trim(Left(splReadLog(lngLp), 8)) Next ActiveSheet.Range("A1").Value = "" Call editcode Call CreateSASfile End Sub Sub editcode() Dim strpos As Long, path As String, wb As Workbook, lrow As Long Dim CharNum As Integer, header As String, newnum As Long With ThisWorkbook.Sheets("Output") lrow =.Range("B100000").End(xlUp).Row

For i = 1 To lrow If LCase(Left(.Cells(i, 2).Value, 6)) = "infile" Then strpos = InStr(7,.Cells(i, 2).Value, "delimiter", vbtextcompare) path = Right(.Cells(i, 2).Value, Len(.Cells(i, 2).Value) - 8) If strpos > 0 Then path = Left(path, strpos - 11) path = Replace(path, "'", "") Set wb = Workbooks.Open(path) ThisWorkbook.Activate j = i Do Until LCase(.Cells(j, 2).Value) = "" If LCase(Left(.Cells(j, 2).Value, 8)) = "informat" Then CharNum = InStr(1,.Cells(j, 2).Value, "$", vbtextcompare) If CharNum <> 0 Then header = Left(.Cells(j, 2).Value, CharNum - 2) header = Right(header, Len(header) - 9) Set rngfound = wb.activesheet.rows("1:1").find(what:=header, LookIn:=xlValues, LookAt:=xlPart) If Not rngfound Is Nothing Then wb.activesheet.range("r1").formular1c1 = "=MAX(LEN(C" & rngfound.column & "))" wb.activesheet.range("r1").formulaarray = wb.activesheet.range("r1").formula newnum = wb.activesheet.range("r1").value.cells(j, 2).Value = "informat " & header & " $" & newnum & ". ;" wb.activesheet.range("r1").clearcontents If LCase(Left(.Cells(j, 2).Value, 6)) = "format" Then CharNum = InStr(1,.Cells(j, 2).Value, "$", vbtextcompare) If CharNum <> 0 Then header = Left(.Cells(j, 2).Value, CharNum - 2) header = Right(header, Len(header) - 7) Set rngfound = wb.activesheet.rows("1:1").find(what:=header, LookIn:=xlValues, LookAt:=xlPart) If Not rngfound Is Nothing Then wb.activesheet.range("r1").formular1c1 = "=MAX(LEN(C" & rngfound.column & "))" wb.activesheet.range("r1").formulaarray = wb.activesheet.range("r1").formula newnum = wb.activesheet.range("r1").value.cells(j, 2).Value = "format " & header & " $" & newnum & ". ;" wb.activesheet.range("r1").clearcontents j = j + 1 Loop wb.close False i = j Next i End With

End Sub Sub CreateSASfile() Dim pth As String pth = ThisWorkbook.path Dim fs As Object Set fs = CreateObject("Scripting.FileSystemObject") Dim a As Object Set a = fs.createtextfile(logfoldername & "vba2sas.sas", True) Dim sh As Worksheet Dim strreadtext As String Set sh = ThisWorkbook.Sheets("Output") Dim rng As Range Set rng = sh.usedrange strreadtext = GetTextFromRangeText(rng) a.writeline (strreadtext) a.close End Sub Function GetTextFromRangeText(ByVal porange As Range) As String Dim vrange As Variant Dim sret As String Dim i As Integer Dim j As Integer If Not porange Is Nothing Then vrange = porange For i = LBound(vRange) To UBound(vRange) For j = LBound(vRange, 2) To UBound(vRange, 2) sret = sret & vrange(i, j) Next j sret = sret & vbcrlf Next i GetTextFromRangeText = sret End Function