Abstract. Background. Summary of method. Using SAS to determine file and space usage in UNIX. Title: Mike Montgomery [MIS Manager, MTN (South Africa)]

Similar documents
Overview 14 Table Definitions and Style Definitions 16 Output Objects and Output Destinations 18 ODS References and Resources 20

SAS System Powers Web Measurement Solution at U S WEST

INTRODUCTION TO SAS HOW SAS WORKS READING RAW DATA INTO SAS

MACROS TO REPORT MISSING DATA: AN HTML DATA COLLECTION GUIDE Patrick Thornton, University of California San Francisco

MARK CARPENTER, Ph.D.

A Generalized Macro-Based Data Reporting System to Produce Both HTML and Text Files

BUSINESS ANALYTICS. 96 HOURS Practical Learning. DexLab Certified. Training Module. Gurgaon (Head Office)

SAS Online Training: Course contents: Agenda:

LST in Comparison Sanket Kale, Parexel International Inc., Durham, NC Sajin Johnny, Parexel International Inc., Durham, NC

User Guide. Master Covers. Version Revision 1

SAS Data Libraries. Definition CHAPTER 26

Using UNIX Shell Scripting to Enhance Your SAS Programming Experience

Electricity Forecasting Full Circle

Contents. About This Book...1

Demystifying Inherited Programs

Base and Advance SAS

Bridge. Master Covers Guide. Version

Your Own SAS Macros Are as Powerful as You Are Ingenious

Using UNIX Shell Scripting to Enhance Your SAS Programming Experience

ODS/RTF Pagination Revisit

HarePoint Business Cards

A SAS Macro for Producing Benchmarks for Interpreting School Effect Sizes

MAPILab Statistics for SharePoint User Guide

PDF Multi-Level Bookmarks via SAS

Knit Perl and SAS Software for DIY Web Applications

Check Please: An Automated Approach to Log Checking

Week 2. Exp 2 (a) (b): Introduction to LINUX OS, Installation of LINUX OS, Basic DOS commands

SAS CURRICULUM. BASE SAS Introduction

Chapter 1 The DATA Step

TIPS AND TRICKS: IMPROVE EFFICIENCY TO YOUR SAS PROGRAMMING

A Macro that can Search and Replace String in your SAS Programs

SMD149 - Operating Systems - File systems

My Reporting Requires a Full Staff Help!

Program Validation: Logging the Log

SAS Data Integration Studio Take Control with Conditional & Looping Transformations

UNIX files searching, and other interrogation techniques

Fly over, drill down, and explore

Permission and Ownership

From Manual to Automatic with Overdrive - Using SAS to Automate Report Generation Faron Kincheloe, Baylor University, Waco, TX

Nbconvert Refactor Final 1.0

SeUGI 19 - Florence WEB Enabling SAS output. Author : Darryl Lawrence

PharmaSUG Paper PO12

APPENDIX 4 Migrating from QMF to SAS/ ASSIST Software. Each of these steps can be executed independently.

Utilizing SAS for Cross- Report Verification in a Clinical Trials Setting

Purchase this book at

Oracle EXAM - 1Z Oracle Solaris Certified Associate Exam. Buy Full Product.

How to Split PDF files with AutoSplit

Omega Engineering Software Archive - FTP Site Statistics. Top 20 Directories Sorted by Disk Space

Generating a Detailed Table of Contents for Web-Served Output

Presentation Goals. Now that You Have Version 8, What Do You Do? Top 8 List: Reason #8 Generation Data Sets. Top 8 List

Untangling and Reformatting NT PerfMon Data to Load a UNIX SAS Database With a Software-Intelligent Data-Adaptive Application

Reducing Credit Union Member Attrition with Predictive Analytics

HarePoint Analytics. For SharePoint. User Manual

What to Expect When You Need to Make a Data Delivery... Helpful Tips and Techniques

Index. Purchase this book at

Techdata Solution. SAS Analytics (Clinical/Finance/Banking)

A SAS Macro Utility to Modify and Validate RTF Outputs for Regional Analyses Jagan Mohan Achi, PPD, Austin, TX Joshua N. Winters, PPD, Rochester, NY

3. Almost always use system options options compress =yes nocenter; /* mostly use */ options ps=9999 ls=200;

ABC Macro and Performance Chart with Benchmarks Annotation

TaiRox CRM. User Guide

INTRODUCTION THE FILENAME STATEMENT CAPTURING THE PROGRAM CODE

This document is intended for users of UniBasic. Copyright 1998 Dynamic Concepts, Inc. (DCI). All rights reserved.

Basic Unix Command. It is used to see the manual of the various command. It helps in selecting the correct options

TLF Management Tools: SAS programs to help in managing large number of TLFs. Eduard Joseph Siquioco, PPD, Manila, Philippines

COMPUTER APPLICATIONS TECHNOLOGY: PAPER II

Unix as a Platform Exercises + Solutions. Course Code: OS 01 UNXPLAT

MIS Reporting in the Credit Card Industry

SUGI 29 Data Warehousing, Management and Quality

Posters. Paper

22S:166. Checking Values of Numeric Variables

Let SAS Help You Easily Find and Access Your Folders and Files

Files (review) and Regular Expressions. Todd Kelley CST8207 Todd Kelley 1

EXST SAS Lab Lab #8: More data step and t-tests

Nigerian Telecommunications (Services) Sector Report Q3 2016

File Systems Management and Examples

4/19/2016. The ext2 file system. Case study: ext2 FS. Recap: i-nodes. Recap: i-nodes. Inode Contents. Ext2 i-nodes

SAS Scalable Performance Data Server 4.3

University of the Free State - FTP Site Statistics. Top 20 Directories Sorted by Disk Space

ODBC. Getting Started OpenLink Server Software Using ODBC

1.264 Lecture 12. HTML Introduction to FrontPage

Case study: ext2 FS 1

Authors: Haidong Tang (Don) Xiao Ji (Samuel) Presenter: Haidong Tang (Don) June 2002

Session 10 MS Word. Mail Merge

File system Security (Access Rights)

Spreadsheet Procedures

ABSTRACT INTRODUCTION MACRO. Paper RF

A Useful Macro for Converting SAS Data sets into SAS Transport Files in Electronic Submissions

HDP HDFS ACLs 3. Apache HDFS ACLs. Date of Publish:

Chapter 6. Importing Data EAD Constraints on EAD

Computer Systems Laboratory Sungkyunkwan University

STAT 3304/5304 Introduction to Statistical Computing. Introduction to SAS

Leave Your Bad Code Behind: 50 Ways to Make Your SAS Code Execute More Efficiently.

ABSTRACT INTRODUCTION TRICK 1: CHOOSE THE BEST METHOD TO CREATE MACRO VARIABLES

Paper AD12 Using the ODS EXCEL Destination with SAS University Edition to Send Graphs to Excel

BASICS BEFORE STARTING SAS DATAWAREHOSING Concepts What is ETL ETL Concepts What is OLAP SAS. What is SAS History of SAS Modules available SAS

Top Coding Tips. Neil Merchant Technical Specialist - SAS

Contents of SAS Programming Techniques

Common File System Commands

IBM Tivoli Storage Manager V6.3 Implementation Exam.

Contents. A Recommended Reading...21 Index iii

Transcription:

Title: Author: Using SAS to determine file and space usage in UNIX Mike Montgomery [MIS Manager, MTN (South Africa)] Abstract The paper will show tools developed to manage a proliferation of SAS files and directories in a UNIX environment. The tools were used to determine how and by whom approximately 2 terabytes of disk space was being used, how the usage was likely to grow, and how much of the space was been occupied without being used. Benefits derived from the tool included: an objective basis for a possible internal charge-back policy for the use of disk space, a basis for an archiving policy, essential information for an overdue V6-V8 conversion plan, essential information for a data warehousing project. The paper will also show techniques that extended the ODS HTML output beyond what is possible through its standard usage. Background When taking over responsibility for the MIS (Management Information Systems) department, I was presented with a huge and un-structured collection of business data that had grown over the seven years of the company s existence. This data is used to generate much of the reporting for the business. The SAS datasets, indexes and programs had accumulated during the rapid growth of a young company in the highly competitive telecommunications industry. This growth happened with less than adequate documentation. The SAS environment consisted of approximately 2 terabytes of files (excluding files that had been moved offline). These files were created by several developers (some of whom have come and gone), and were scattered over many directories (exactly how many only became known after applying the utilities described in this paper). Using the utilities discussed, it has become possible to get a thorough understanding and much better control of the environment. Summary of method It was known which disks were used to store SAS files. The utility does the following: Determines the sub-directories under the root directory of each disk. For each directory and sub-directory, searches for files with selected extensions. The extensions used by SAS are provided in SAS documentation. For each file found, determines (through operating system commands) the owner of the file, when the file was created, when modified, when last used, and the size of the file. For each version 6 and version 8 dataset, determines details about the dataset and the variables involved (using PROC DATASETS and PROC CONTENTS). For each dataset, determines whether it is unique in terms of the combination of variables it contains. If it is not unique, determines what other datasets (having the same combination of variables) it can be grouped with. The details gathered above were exported to a Windows NT SAS session from where static HTML pages were created using the SAS output delivery system (ODS). The utility extended the use of the ODS to present output in ways that are not possible with the conventional use of the ODS. In particular, output from different and dis-similar procedures can be grouped together at the discretion of the developer and arranged to any level of nesting. The data gathered from the UNIX environment is stored for month on month comparisons of the environment, and for estimating the future growth of the environment.

Example of results Example A: Space usage This shows detail of space used per user (owner). The user with the highest usage can be identified. The report allows drilling down to see what files are associated with a user. A similar presentation with drill-down is available per file type, per directory, and per department (implied from the user name). This provides a basis for applying a charge back policy per department. In drilling down to files belonging to a specific user, results similar to the following are shown. [Included in the display are: date created, date changed, and date last used.] This shows some datasets as being unique and others as belonging to a group 284. This is based on the combination of variables involved. It is possible to drill down to see what are the unique datasets, or to what other datasets are in group 284 (i.e. what other datasets use exactly the same combination of variables), or to what are the variables involved. Drilling down to group 284 displays the following.

It shows that the datasets named MANY and UC_DUPS are related to each other even though they are named differently. Example B: Un-used files This shows a summary of how much space has not been used since the month indicated. Detail of which files are involved can be selected. The information was used to determine an archiving policy. The display also shows how the index to output from different PROCs can be arranged to any level of indentation desired.

Example C: Growth This shows a summary of the growth since the previous month. A simple calculation (assuming the same growth/decline each month) was used to determine the implied annual growth and what the disk space usage will be at selected points in the future. After the current round of deleting un-necessary files, a more appropriate growth forecast can be done. The information from this output is useful for capacity planning. The other options under section 6 allow monitoring of the growth per file type, per user, per directory and per department, making it possible to see exactly where the space usage is growing/declining, thereby allowing the manager to know where action needs to be taken. The growth per user is shown below.

Example D: Index to dataset, variables, programs etc. The items in section 8 make it possible to easily find all occurrences (across multiple directories) of datasets or programs with particular names, and in which datasets variables with a particular name or label exist. This is of importance to locate un-necessary duplication of datasets and multiple versions of programs, and to identify which datasets are to be used when wanting to do analysis on selected variables. An extract of the duplicate datasets is shown Benefits achieved Automatically generated documentation. Understanding the extent of the environment being managed. Finding un-used files. Finding duplicate SAS datasets. Finding SAS programs with the same name. Ability to direct users to what files can be deleted. Finding datasets having a particular variable. Finding datasets having a variable with a particular label. Future work Extending the analysis to a VMS environment. The company has some data in a VMS environment. The utility will be copied to the VMS environment. The UNIX-specific system commands need to be changed to VMS equivalents. Extending the grouping of datasets.

The grouping of datasets based on the combinations of variables they contain has been useful in identifying which datasets are possibly redundant. This will be extended to identifying that the variables in a group are a subset of the variables in another group. This will lead to the possibility of datasets in the group with the smaller set of variables being scrapped in favour of the datasets in the group with the larger set of variables. Extending the analysis to non-sas files. As a result of interest from the operations management, the techniques will be applied to build up similar documentation on space usage by non-sas files in the UNIX environment. All that need change is the file extensions searched for. Appendix 1: An extract of a macro to determine sub-directories of a UNIX directory. The macro caters for path names that contain blanks by enclosing them in quotes in the UNIX command. It creates a dataset containing the names of the sub-directories. Example of UNIX command generated: ls -lr /usr/users/name of path grep./ > xxx.txt Global variables used: &pgmroot name of path from where the utility is run. &maxlen maximum length of path names. %macro subdirs(root,out); %local file; %let file=&pgmroot/xxsubdirs.txt; data _null_; call system("ls -lr '&root' grep./ > &file"); data &out baddir; infile "&file" lrecl=&maxlen pad; length path $ &maxlen; input path $ & 1 - &maxlen; path=left(trim(compress(path,':'))); if substr(path,1,1) = '/' then output &out; else if substr(path,1,1) = 'l' then link logical; else output baddir; return; logical: /** decide how to handle logical links. **/ delete; return; %mend; Sample of file created and then read by the macro above. /usr/users/sasuser/production: /usr/users/sasuser/production/checkfiles: /usr/users/sasuser/production/cmt_programs: /usr/users/sasuser/production/cmt_programs_production: /usr/users/sasuser/production/cmt_scripts: /usr/users/sasuser/production/datfiles: /usr/users/sasuser/production/itsv: /usr/users/sasuser/production/kpi:

Appendix 2: An extract from macros to read operating system details about files. The macro %findext searches a specified UNIX directory for files with a particular extension. It creates a dataset containing the names of the files, their size, date created, date last used, date last changed, and owner of each file. It caters for path names that contain blanks by enclosing them in quotes in the UNIX command. Detail of %readext is not shown. It reads the files created by the macro %findext. See example of file below. 1 st parameter of %readext is the name of a dataset to be created. 2 nd parameter of %readext is the name of the file to be read. Example of UNIX command generated: ls -lu /usr/users/name of path /*.SAS > xxx.txt %macro findext(dir,ext,dataset,type); %local created used changed; %let created=&pgmroot/xxcreated.txt; %let used =&pgmroot/xxused.txt; %let changed=&pgmroot/xxchanged.txt; data _null_; ext=left(trim("&ext")); call system("ls -l '&dir'/*." ext " > &created"); call system("ls -lu '&dir'/*." ext " > &used"); call system("ls -lc '&dir'/*." ext " > &changed"); %readext(created,&created); %readext(used,&used); %readext(changed,&changed); data &dataset; length type $ 15; retain type "&type"; /* a description of the type of file. E.g. V6 data */ merge created(rename=(date=created)) %mend; used (rename=(date=used)) changed(rename=(date=changed)); by file; Sample of file created and then read by the macros above. Note that the UNIX command is inconsistent in reporting the date or the time (see 2 nd last line). 1 -rwxrwxrwx 1 sasuser users 364 May 29 2000 indexip24.sas 1 -rwxrwxrwx 1 sasuser users 364 May 29 2000 indexoo24.sas 1 -rwxrwxrwx 1 sasuser users 321 Jun 1 2000 indexop24.sas 1 -rw-rw-rw- 1 hoosen_i users 32 Jul 16 2001 iqudb7.sas 1 -rwxrwxrwx 1 sasuser users 156 May 24 2000 marlene.sas 1 -rwxrwxrwx 1 sasuser users 200 May 23 2000 mddbtest.sas 12 -rwxrwxrwx 1 sasuser users 12274 Apr 22 13:56 mis_auto.sas 9 -rwxrwxrw- 1 sasuser users 8798 Nov 20 2000 mis_auto_20112000.sas

Appendix 3: Extending the use of ODS HTML output These macros were written after examining the contents file at different stages of its creation by the ODS. I cannot claim to fully understand each of the HTML parameters used, although I can guess at some. The macros enable me to work with the contents page as a SAS specialist rather than as an HTML specialist. Once I find some time to learn HTML, I can extend the macros to include HTML specific objects (e.g. drop down lists). %macro htmlproclabel(text=); /* text that would have been generated by ODS PROCLABEL */ put '<font color="#003399"><li><span>' "&text" '</SPAN><br></font>'; %mend; %macro htmllevel(type=,href=,target=,text=,break=yes); /* manage indentation levels */ %if &type=new %then %do; put '<dl>'; %end; %if &type=new or &type= %then %do; put '<dt><b> </b>' '<A HREF="' "&href" '" TARGET="' "&target" '">' "&text" '</a><br>'; %end; %if &type=end %then %do; put '</dl>'; %if &break=yes %then %do; put '<br>'; %end; %end; %mend; Usage: 1) Create the contents file. %let outpath = c:\output\destination; /* path to receive output files */ %let framefile =abc_frame; %let contfile =&framefile._contents; %let anchor =xyz; ods html path ="&outpath"(url=none) frame ="&framefile..html" contents="&contfile..html" (no_bottom_matter) body ="xxx.html" /* needed to satisfy ods, but does not affect contents page */ ; Include style=, newfile=, anchor= etc. as appropriate.

2) Stop SAS from updating the contents file. ods html close; ods html path ="&outpath"(url=none) body ="xxx.html" /* needed to satisfy ods, but does not affect contents page */ ; Include style=, newfile=, anchor= etc. as appropriate, but not frame= or contents=. 3) Create HTML ouput using SAS procedures. Note the names of the HTML files for use in the next step. 4) Use a DATA step to write into the contents file to achieve the desired layout. filename contents "&outpath\&contfile..html" mod ; data _null_; file contents; %htmlproclabel(text=%quote(section heading, numbering is automatic)); %htmllevel(type=new,href=&outpath.outputa.html#&anchor.1,target=body,text=indented text); %htmllevel(type=,href=&outpath.ouputb.html#&anchor.6,t arget=body,text=more text);... etc... %htmllevel(type=end); %htmlproclabel(text=%quote(another section, use quote if with commas)); %htmllevel(type=new,href=&outpath.outputx.html#&anchor.4,target=body,text=indented text); %htmllevel(type=,href=&outpath.ouputy.html#&anchor.3,t arget=body,text=more text);... etc... %htmllevel(type=end); put '</BODY></HTML>';