My objective is twofold: Examine the capabilities of MP Connect and apply those capabilities to a real-world application.

Similar documents
CROSS-REFERENCE TABLE ASME A Including A17.1a-1997 Through A17.1d 2000 vs. ASME A

Introduction to PTC Windchill MPMLink 11.0

GUIDE LINES FOR ONLINE SUBMISSION OF APPLICATIONS FOR COMPETITIVE EXAMINATION (CSS) 2011

Volunteer Registration Instructions

STAT:5400 Computing in Statistics

DATA STRUCTURES AND ALGORITHMS

DATA STRUCTURES AND ALGORITHMS

Survey questions. Winter Tracking Survey 2012 Final Topline 02/22/2012 Data for January 20 February 19, p e w i n t e r n e t.

VMware - vsphere INSTALL & CONFIGURE BEYOND INTRODUCTION V1.3

Safehome Project. Analysis Model Prepared by Inyoung Kim Donghan Jang <TEAM 6> KAIST CS350 Introduction to Software Engineering

STAT 503 Fall Introduction to SAS

Andale Store Getting Started Manual

Certificate Program. Introduction to Microsoft Excel 2013

Employer Self Service (ESS) User Quick Guide

RULES OF THE TENNESSEE DEPARTMENT OF STATE DIVISION OF BUSINESS SERVICES CHAPTER UNIFORM COMMERCIAL CODE SEARCH REQUESTS AND REPORTS

SQL Server T-SQL Recipes. Andy Roberts Wayne Sheffield. A Problem-Solution Approach. Jonathan Gennick. Jason Brimhall David Dye

Based on CBSE, ICSE & GCSE Syllabus

Shop Manager Help. Version 5

FAQ for PVRTV Copyright KWorld Computer Co., Ltd. All rights are reserved. November 9, 2007

Positional Amino Acid Frequency Patterns for Automatic Protein Annotation

FAQ for PVRTV-7134ex. Copyright KWorld Computer Co., Ltd. All rights are reserved. November 9, 2007

Introduction to Windchill PDMLink 10.2 for the Implementation Team

Introduction to Creo Elements/Direct 19.0 Modeling

NetSuite Administrator Sample Test: December 2018

EXST3201 Mousefeed01 Page 1

AAM Guide for Authors

THE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533. Time: 50 minutes 40 Marks FRST Marks FRST 533 (extra questions)

Mathematics Shape and Space: Polygon Angles

BMEGUI Tutorial 1 Spatial kriging

Approver. Approver Approver v 13.3 Page 1 Questions? Call or Revised 11/18/13

7. Mobile Wallets 78

SAS 9 Boosts Performance with Parallel Capabilities of SAS/CONNECT

PROBLEM SOLVING WITH FORTRAN 90

1 PEW RESEARCH CENTER

"Charting the Course... SharePoint 2007 Hands-On Labs Course Summary

"Charting the Course... Implementing Cisco Wireless Network Fundamentals V.1 (WIFUND) Course Summary

FAQ for PVRTV-305U. Copyright KWorld Computer Co., Ltd. All rights are reserved. November 9, 2007

Measurement in Science

CRITERIA FOR THE EVALUATION OF SIMULATION SOFTWARE

GSM ProTherm PT1000. User Guide

Choosing the Right Procedure

So Much Data, So Little Time: Splitting Datasets For More Efficient Run Times and Meeting FDA Submission Guidelines

Systems Architecture. Paper

List of Modules with its Forms and Reports (IFW Campus ERP - EduTech)

Divide and Conquer Writing Parallel SAS Code to Speed Up Your SAS Program

ProLogis European Properties Per Unit U.S. Taxable Income Allocation* For the Period from January 1, 2011 through December 31, 2011

Annexure I: Contact Details:

Oracle Exadata Recipes

A Brief Outlook at Block Ciphers

User's Guide 0 21/12/ /03/ /02/ /03/ /02/ /05/ MS1-7428

MIS Reporting in the Credit Card Industry

Program Validation: Logging the Log

Course Outline. ProTech Professional Technical Services, Inc. Veritas Backup Exec 20.1: Administration. Course Summary.

A SAS Macro for Producing Benchmarks for Interpreting School Effect Sizes

Contents of SAS Programming Techniques

System Administration of PTC Windchill 11.0

SCCAP. User Guide: Version 198

EXST SAS Lab Lab #6: More DATA STEP tasks

Paper CC16. William E Benjamin Jr, Owl Computer Consultancy LLC, Phoenix, AZ

Configuration Guide for High Availability Distributed System on Microsoft SQL Server

User's Guide 0 21/12/ /03/ /02/ /03/ /02/ MS1-7428

SHIVAJI UNIVERSITY, KOLHAPUR.

Kalaivani Ananthan Version 2.0 October 2008 Funded by the Library of Congress

IT 341 Introduction to System Administration Project I Installing Ubuntu Server on an Virtual Machine

Using SAS to Analyze CYP-C Data: Introduction to Procedures. Overview

SAS/STAT 14.2 User s Guide. The SIMNORMAL Procedure

Paper ODS, YES! Odious, NO! An Introduction to the SAS Output Delivery System

IT 341 Introduction to System Administration Project I Installing Ubuntu Server on an Virtual Machine

March 24, 2017 Page 1 of 1 ENSTROM F-28/280 SERIES ILLUSTRATED PARTS CATALOG Edition

SAS/CONNECT for SAS Viya 3.3: User s Guide

* Sample SAS program * Data set is from Dean and Voss (1999) Design and Analysis of * Experiments. Problem 3, page 129.

Centering and Interactions: The Training Data

CITY UNIVERSITY OF NEW YORK. Creating a New Project in IRBNet. i. After logging in, click Create New Project on left side of the page.

Beating Gridlock: Parallel Programming with SAS Grid Computing and SAS/CONNECT

Choosing the Right Procedure

CHAPTER 7 Examples of Combining Compute Services and Data Transfer Services

2.) ilit Welcome Screen

Providing Users with Access to the SAS Data Warehouse: A Discussion of Three Methods Employed and Supported

Checking for Duplicates Wendi L. Wright

VERIFICATION AND VALIDATION FOR QUALITY OF UML 2.0 MODELS

"Charting the Course... Java Programming Language. Course Summary

LOUISIANA COMMUNITY & TECHNICAL COLLEGE SYSTEM

CSc 372. Comparative Programming Languages. 24 : Prolog Exercises. Department of Computer Science University of Arizona

Parallel processing techniques for performance improvement for SAS processes: Part II Viraj R Kumbhakarna, JPMorgan Chase & Co.

The SIMNORMAL Procedure (Chapter)

Older adults and internet use

Laboratory Topics 1 & 2

PERFORMANCE IN INITIATING AND DELIVERING CLINICAL RESEARCH CTP PLATFORM INSTRUCTIONS

The STANDARD Procedure

Using Cross-Environment Data Access (CEDA)

A SAS Macro Utility to Modify and Validate RTF Outputs for Regional Analyses Jagan Mohan Achi, PPD, Austin, TX Joshua N. Winters, PPD, Rochester, NY

Oracle Financial Services Compliance Regulatory Reporting Framework User Guide. Release May 2018

Business Intelligence Roadmap HDT923 Three Days

SAS System Powers Web Measurement Solution at U S WEST

Plate I. The RGB color cube in 3D and its faces unfolded. Any RGB color is a point in the cube. (See also Figure 3.13.)

Creating and Executing Stored Compiled DATA Step Programs

CITY UNIVERSITY OF NEW YORK. i. Visit:

PARTICIPATION IN THE INTERNATIONAL PHYSICS OLYMPIADS

SUGGESTED ANSWERS A COMPILATION QUESTIONS PROFESSIONAL EDUCATION (COURSE - II) PAPER 6 : INFORMATION TECHNOLOGY SET AT THE INSTITUTE S EXAMINATIONS

PLS205 Lab 1 January 9, Laboratory Topics 1 & 2

Transcription:

Abstract MP CONNECT: Warp Engine for SAS (Multi-Processing in the Sun Solaris Environment). Pablo J. Nogueras CitiFinancial International, Risk Management Technology, Irving, Texas When you are assigned a project, the first question asked by the assignor is not How will you program the project?, is not What kind of Quality Control will you use?, or is not How much data will you use?. The question asked is How FAST can you get me the results?. There are various programming techniques in SAS that allow one to increase execution speed. One such technique is the use of Parallel Processing or Multi- Processing, that is the execution of self-contained tasks simultaneously. This paper will demonstrate the use of MP CONNECT (part of SAS/CONNECT) to decrease execution time SAS programs. Introduction MP Connect is a feature of SAS/CONNECT that allows a programmer to take advantage of their multi-processor box or processors connected via a network. MP Connect first appeared in SAS version 8. It has continued with various improvements through SAS versions 8.1, 8.2, 9.0, and 9.1.3. My objective is twofold: Examine the capabilities of MP Connect and apply those capabilities to a real-world application. MP Connect MP Connect has the capability to reduce processing time by sub-dividing programming tasks across 2 or more processors. In theory, one should reduce the amount of processing time by the amount of processors. Thus, 2 processors should reduce time by 2, 3 processors by 3, etc. However, processors are not the only part of our computing systems. There is I/O and system overhead that must be accounted for. When these are taken into account, the relationship begins linear and then begins to flatten out as more overhead processing is required as more processors are added. Locations of MP CONNECT documentation are provided below: SASV8 Online DOC path: SAS/CONNECT and SAS/SHARE, SAS/CONNECT User s Guide, Changes and Enhancements, Version 8 Multi-Process (MP) CONNECT What's New in SAS Software for Release 8.1, SAS/CONNECT What's New in SAS Software for Release 8.2, SAS/CONNECT SAS HELP, SAS/CONNECT All SAS Documentation is Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved. Data Values Program At Citigroup, we created a program to perform rudimentary analysis on all the variables in a dataset. This program is used when first developing a load program to verify the values. It is also, used on a monthly, quarterly, or yearly period to QA data values within our datasets. The program analyzes character variables with frequency counts and numeric variables with PROC Univariate. Depending on the number of observation and the number of variables (rows and columns for you newer programmers), the time to execute the program varies. Since the program executes all of its SAS statements sequentially, we saw many execution times of 8 hours, 16, even 24 hours. The Data Values program is included in the Appendix. 275

Data Values Program Sample Print Data represents values for JUN2004 data ------- run on: 11OCT04 1 09:38 Monday, October 11, 2004 The UNIVARIATE Procedure Variable: XXXXXXXXXX ( XXXXXXXXXXXXXXXXXXX AMOUNT) Moments N 215488 Sum Weights 215488 Mean 651.523315 Sum Observations 140395456 Std Deviation 3811.21751 Variance 14525378.9 Skewness 15.7260807 Kurtosis 349.46136 Uncorrected SS 3.2215E12 Corrected SS 3.13003E12 Coeff Variation 584.970241 Std Error Mean 8.21017075 Basic Statistical Measures Location Variability Mean 651.5233 Std Deviation 3811 Median 100.0000 Variance 14525379 Mode 0.0000 Range 203035 Interquartile Range 183.38000 Tests for Location: Mu0=0 Test -Statistic- -----p Value------ Student's t t 79.35564 Pr > t <.0001 Sign M 100394.5 Pr >= M <.0001 Signed Rank S 1.018E10 Pr >= S <.0001 Quantiles (Definition 5) Quantile Estimate 100% Max 199394.35 99% 11277.28 95% 2276.36 90% 563.15 75% Q3 213.38 50% Median 100.00 25% Q1 30.00 10% 5.00 5% 0.00 1% 0.00 0% Min -3641.04 Extreme Observations ------Lowest----- -----Highest---- Value Obs Value Obs -3641.04 107248 147005 198820-2859.00 134928 153140 22348-2500.00 104325 154089 41114-2261.17 135603 159244 203860-1820.00 161374 199394 207147 276

Data Values Program Sample Print (continued) Data represents values for JUN2004 data ------- run on: 11OCT04 1 09:38 Monday, October 11, 2004 Histogram # Boxplot Normal Probability Plot 195000+* 1 * 195000+ *. 175000+ 175000+. 155000+* 3 * 155000+ *.* 3 * * 135000+ 135000+.* 3 * * 115000+* 6 * 115000+ *.* 13 * * 95000+* 17 * 95000+ *.* 42 * * 75000+* 39 * 75000+ *.* 71 * * 55000+* 124 * 55000+ *.* 161 * * 35000+* 292 * 35000+ *.* 540 * * 15000+* 1100 * 15000+ ***.************************************************212258 +--0--+ *************************************************+ -5000+* 815 0-5000+*+++++++++++++++++++++++ ----+----+----+----+----+----+----+----+----+--- +----+----+----+----+----+----+----+----+----+----+ * may represent up to 4423 counts -2-1 0 +1 +2 ------------------------------------------------------------------------------------------------------------------ Report represents values for JUN2004 data, 10:39 Monday, October 11, 2004 1 Variable: XXXXXX Description: XXXXXXXXXXX CODE Less than 50 Discrete Values -- 25 Discrete Values -- 215,488 Total Population XXXXXXXXX # of Records % of Total CODE with Value Population 06 70,614 32.7693 02 36,902 17.1249 10 32,641 15.1475 44 22,602 10.4888 07 13,129 6.0927 09 10,351 4.8035 17 10,058 4.6675 00 8,125 3.7705 08 6,356 2.9496 19 3,184 1.4776 39 472 0.2190 03 312 0.1448 42 198 0.0919 11 174 0.0807 04 105 0.0487 01 102 0.0473 05 87 0.0404 20 48 0.0223 13 10 0.0046 12 5 0.0023 34 5 0.0023 18 4 0.0019 41 2 0.0009 15 1 0.0005 28 1 0.0005 N = 25 ------------------------------------------------------------------------------------------------------------------ 277

Data Values Program MP CONNECT With the release of SASV8 and the addition of asynchronous processing in SAS/CONNECT, I researched MP CONNECT and how it could be applied to our SAS programs. With the help of David Cedillo, we came up with a revision to our Data Values program. We knew that each process could analyze each variable independently of the other variables. Our test showed that increasing the number of processes, directly impacted the time through a divisor effect. New Time = Old Time / # of Processes. Now this is not an exact formula as there is overhead associated with each new process, but can be used as an educated guess. The Data Values program with MP CONNECT is included in the Appendix. Data Values Program Benchmark Below is the environment and timings (before MP CONNECT) associated with Data Values program. Sun E10K Solaris Version 8 40 Processors 52G Memory 324G of SAS Work Space Available (in 4 groups - 150, 70, 62, 62) 215,488 Observations 1357 Variables NOTE: SAS Institute Inc., SAS Campus Drive, Cary, NC USA 27513-2414 NOTE: The SAS System used: real time 8:55:40.31 cpu time 8:55:06.96 Data Values Program MP CONNECT Benchmark The table below examines the relationship between the number of process, the CPU time, and the Execution Factor. The execution factor is the old time divided by the new time. We can clearly see the effect of I/O and system overhead when we increase usage beyond 10 processors. Number of Processors CPU Time (Hours) CPU Time (Minutes) Execution Factor 1 8.917 535.02 1 5 1.633 97.98 5.460502143 10 0.9 54 9.907777778 15 0.65 39 13.71846154 20 0.5 30 17.834 25 0.417 25.02 21.38369305 30 0.358 21.48 24.90782123 35 0.325 19.5 27.43692308 40 0.3 18 29.72333333 278

Data Values Program MP CONNECT Benchmark (Continued) I have included graphs to illustrate the relationships between number of processors and execution time and between number of processors and Execution Factor (Actual vs. expected scaling factor). CPU Time (Minutes) 600 Number of Minutes 500 400 300 200 100 CPU Time (Minutes) 0 0 10 20 30 40 50 Number of Processes Execution Factor Benchmark Time/Process Time 35 30 25 20 15 10 5 0 0 10 20 30 40 50 Execution Factor Number of Processes 279

Conclusion MP Connect works as a tool to DECREASE execution time. The example I presented worked on a single dataset, manipulating many variables. One could also use MP CONNECT logic in programs that use independent datasets e.g. You need to look at 36 months of history for customers using your monthly datasets. You may program the task using 12 MP CONNECT session at one time and thus, reducing your execution time approximately 12 times. Issues that one must take account when using MP CONNECT: It will work on a single processor box. NOT RECOMMENDED. The more processes you execute simultaneously, the more memory, I/O, and disk resources are used. It is not recommended you program MP CONNECT to execute more tasks than processors on your box. SAS does not have an option to limit the number of MP CONNECT processes that can be executed. One must work with the users to avoid scenarios such as: 25 users each using 25 MP CONNECT processes on a 30 processor box. Work Library depending on the SASCMD= used and how you allow people to allocate WORK Libraries, the default of creating each MP CONNECT process in the same WORK Library may cause I/O or Space issues. Contact Information Pablo J. Nogueras Lead Analyst, CitiFinancial International Risk Management Technology 290 East John Carpenter Freeway Irving, TX 75062 972-652-1046 pablo.j.nogueras@citigroup.com Acknowledgements Multiprocessing with Version 8 of the SAS System, Cheryl Doninger, SAS Institute Inc. David Cedillo, CitiFinancial International, Decision Science Further Reading/Research SAS Community: Scalability and Performance http://support.sas.com/rnd/scalability/index.html Notices SAS and SAS/CONNECT are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. SUN and SOLARIS are registered trademarks or trademarks of SUN Corporation in the USA and other countries. Other brands and names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. 280

Appendix Data Values Program dm 'clear log' ; dm 'clear output' ; /**********************************************************************/ /** data_value.sas */ /* Project : XXXXXXXXXXXXXXXXXXXXXXXXXXX Date: XXXXXXXXXX */ /* Requestor: XXXXXXXXXXX */ /* Analyst : XXXXXXXXXXXX */ /**********************************************************************/ /**********************************************************************/ /********** BEGINNING OF PARAMETERS *********************************/ libname indata '/cdsg/cis/mf/uk/aims/data/' ; options yearcutoff=1950 obs=max symbolgen mlogic mprint source2 ; %let dsname = mf200406 ; %let month = %substr(&dsname,3,6) ; %let dspre = ZZ ; %let dset = indata.&dsname ; %let outpath=/cdsg/users/noguerap/pgms/mp_connect_test/output/; run ; data _null_one ; length month $7.; month =put(input("&month",yymmn6.),monyy7.); call symput('month',month); run ; /********** END OF PARAMETERS - DON'T MODIFY AFTER HERE!!! **********/ /**********************************************************************/ filename pgm1 temp ; run; proc contents data= &dset out=conttemp noprint ; /******************************************/ /* BUILD DYNAMIC ANALYSIS CODE */ /******************************************/ filename temp1 temp; run; data _null_; file pgm1 ; set conttemp ; if label='' then label='not Available'; string='variable: ' compress(name) ' Description: ' (label); fname=compress(trim("&outpath") trim("&dspre") '_' trim(name) '.txt' ); if fileexist(fname) then file_xst = 1 ; else file_xst = 0 ; proc printto; '; filename out1 "' fname '"; '; run;'; proc printto file=temp1 new; '; /*** Dataset gt 0 Section and omit normal processing if null ***/ if nobs > 0 then 281

do ; /*** Proc Univariate Section ***/ if type=1 then do; title "Data represents values for &month data ------- '; proc univariate data= &dset plot; '; var ' name '; '; run on: &sysdate "; '; proc printto; '; end; /*** End Proc Univariate Sect ***/ /*** Data Frequency Section, output top 50 or less frequencies ***/ else if type=2 then do; data ' name ' (index=( ' name ')); '; set &dset (keep= ' name '); '; data ' name '; ' ; set ' name ' nobs=nobs end=eof; ' ; by ' name ' ; ' ; rec = nobs ; ' ; attrib count format=comma12. label="# of Records with Value" ' ; per label="% of Total Population"; ' ; retain count countd 0 ; '; if first.' name ' then do; '; count =1 ;'; countd=sum(countd,1); '; end; '; else count = count + 1 ; ' ; per = (count/rec)*100; ' ; if last.' name ' then '; do ; ' ; if eof then '; do ;' ; if countd > 50 then flag = "More than 50 Discrete Values"; '; else if rec = 0 or (rec=1 and " name " = " ") then flag = "All Values Missing"; '; else if countd < 50 then flag = "Less than 50 Discrete Values"; ' ; rec=compress(trim(rec)); '; call symput("flag",flag); '; call symput("rec",put(rec,comma12.)); '; call symput("dis",put(countd,comma12.)); '; end; ' ; out name ' ; '; end; ' ; run ; '; title1 "Report represents values for &month data,"; ' ; title2 ' string ';' ; title3 "&flag --&dis Discrete Values --&rec Total Population"; '; proc sort data= ' name ' ; ' ; by descending count; ' ; run ; '; proc print data= ' name ' (obs=50) n noobs label; '; var ' name ' count per; '; run ;'; proc datasets library=work nolist ; ' ; delete ' name ' ; ' ; /*** End Data Freq Sect ***/ 282

/*** End Datasets with 1 or more recods ***/ /*** Null Dataset Section ***/ else if nobs = 0 then do ; proc printto ; ' ; data _null_ ; ' ; file temp1 mod ; ' ; put "No Observations for Qtr Ending &month as of &sysdate"; ' ; /*** End Null Dataset Sect ***/ proc printto; '; /*** Output Section - Appends new data to top of file ***/ data _null_ ; ' ; file temp1 mod ; ' ; put " " ; ' ; put "------------------------------------------------------------------------------------------------------- -----------" ; ' ; run; /*** Check if file previously existed ***/ if file_xst = 1 then do ; data _null_ ; ' ; file temp1 mod ; ' ; infile out1 ; ' ; input ; ' ; put _infile_ ; ' ; data _null_ ; ' ; file out1 ; ' ; infile temp1 ; ' ; input ; ' ; put _infile_ ; ' ; /*** End Output Section ***/ filename out1 clear ; '; %include pgm1 ; run; 283

Appendix Data Values Program MP CONNECT dm 'clear log' ; dm 'clear output' ; /* SAS PROGRAM DOCUMENTATION ----------------------------------------------- */ /* PROGRAM NAME: data_values_mp.sas */ /* PROGRAMMER : Pablo J. Nogueras */ /* PURPOSE : Create Data Value Dictionaries for SAS Dataset */ /* REQUESTOR : XXXXXXXXXXXXXX */ /* INPUT : SAS datasets */ /* OUTPUT : Text files containing Proc Univariate (Numeric) or */ /* Datastep Frequency (Character) data. The frequency data is */ /* limited to top 50 discrete values. */ /* CALLED BY : n/a */ /* CALLS : n/a */ /* SCHEDULED : n/a */ /* VARIABLES : n/a */ /* -------------------------------------------------------------------------- */ /* Revision History */ /* Programmer Revision Date */ /* ========== ==================================================== ======== */ /* P NOGUERAS Modification of Original Program and David Cedillo 07/02/04 */ /* Program */ /* -------------------------------------------------------------------------- */ /**********************************************************************/ /********** BEGINNING OF PARAMETERS *********************************/ /* The autosignon and sacmd options are necessary for MP Connect processing. Autosignon=Yes allows you to create a new "remote" SAS session on the current computer without having to specify login information. Sascmd= specifies the location of the SAS executable. Depending on the OS (in this case Solaris, you may have to specify the exact path. */ options obs=max pagesize=120 mlogic mprint symbolgen macrogen source2 autosignon=yes sascmd="/opt/sasv8/sas"; libname indata "/cdsg/cis/mf/uk/aims/data" ; %let dsname = mf200406 ; %let month = %substr(&dsname,3,6) ; %let dspre = XA ; %let dset = indata.&dsname ; %let outpath =/cdsg/users/noguerap/pgms/mp_connect_test/output/; /* Processess (usually equal to processors) dedicated to task */ %let maxsesn = 5 ; run ; data _null_ ; length month $7; if "&month" = " " then do; month = intnx("month",today(),-1); end; else do; 284

month =put(input("&month",yymmn6.),monyy7.); end; call symput("month",month); run ; /********** END OF PARAMETERS - DON'T MODIFY AFTER HERE!!! **********/ /**********************************************************************/ /* Create temporay file to hold dynamic code */ filename pgm1 temp ; /* Build dataset from Proc Contents to feed Dynamic Code creation */ proc contents data= &dset out=conttemp noprint; run; /******************************************/ /* BUILD DYNAMIC CODE */ /******************************************/ /* Each Variable will create an RSUBMIT block for each MP CONNECT Process */ data _null_; file pgm1 ; /* Output to TEMP file */ set conttemp end=eof; if label='' then label='not Available'; /* Trap Missing Labels */ /* Create title string for output */ string='variable: ' compress(name) ' Description: ' (label); /* Create filename to store output from Procedure */ fname=compress(trim("&outpath") trim("&dspre") '_' trim(name) '.txt'); /* If the file exists previously then we want to set a flag to add to the original file */ if fileexist(fname) then file_xst = 1 ; else file_xst = 0 ; month = "&month" ; x + 1 ; /* MP Connect counter and job name */ rsubmit process = job' x ' wait=no ; ' ; libname indata "/cdsg/cis/mf/uk/aims/data" ; ' ; proc printto; '; filename temp1 temp ; '; run;'; filename out1 "' fname '"; '; run;'; proc printto file=temp1 new; '; /*** Dataset gt 0 Section and omit normal processing if null ***/ if nobs > 0 then do ; /*** Proc Univariate Section ***/ if type=1 then do; title "Data represents values for ' month 'data ------- run on: &sysdate"; '; end; proc univariate data= indata.' memname ' plot; '; var ' name '; '; proc printto; '; 285

/*** End Proc Univariate Sect ***/ /*** Data Frequency Section, output top 50 or less frequencies ***/ else if type=2 then do; data ' name ' (index=( ' name ')); '; set indata.' memname '(keep= ' name '); '; data ' name '; ' ; set ' name ' nobs=nobs end=eof; ' ; by ' name ' ; ' ; rec = nobs ; ' ; attrib count format=comma12. label="# of Records with Value" ' ; per label="% of Total Population"; ' ; retain count countd 0 ; '; if first.' name ' then do; '; count =1 ;'; countd=sum(countd,1); '; end; '; else count = count + 1 ; ' ; per = (count/rec)*100; ' ; if last.' name ' then '; do ; ' ; if eof then '; do ;' ; if countd > 50 then flag = "More than 50 Discrete Values"; '; else if rec = 0 or (rec=1 and " name " = " ") then flag = "All Values Missing"; '; else if countd < 50 then flag = "Less than 50 Discrete Values"; ' ; rec=compress(trim(rec)); '; call symput("flag",flag); '; call symput("rec",put(rec,comma12.)); '; call symput("dis",put(countd,comma12.)); '; end; ' ; out name ' ; '; end; ' ; run ; '; title1 "Report represents values for ' month 'data,"; ' ; title2 ' string ';' ; title3 "&flag --&dis Discrete Values --&rec Total Population"; '; proc sort data= ' name ' ; ' ; by descending count; ' ; run ; '; proc print data= ' name ' (obs=50) n noobs label; '; var ' name ' count per; '; run ;'; proc datasets library=work nolist ; ' ; delete ' name ' ; ' ; /*** End Data Freq Sect ***/ /*** End Datasets with 1 or more recods ***/ /*** Null Dataset Section ***/ else if nobs = 0 then do ; proc printto ; ' ; data _null_ ; ' ; file temp1 mod ; ' ; put "No Observations for Qtr Ending &month as of &sysdate"; ' ; 286

/*** End Null Dataset Sect ***/ proc printto; '; /*** Output Section - Appends new data to top of file ***/ data _null_ ; ' ; file temp1 mod ; ' ; put " " ; ' ; put "------------------------------------------------------------------------------------------------------- -----------" ; ' ; run; /*** Check if file previously existed ***/ if file_xst = 1 then do; data _null_ ; ' ; file temp1 mod ; ' ; infile out1 ; ' ; input ; ' ; put _infile_ ; ' ; data _null_ ; ' ; file out1 ; ' ; infile temp1 ; ' ; input ; ' ; put _infile_ ; ' ; /*** End Output Section ***/ filename out1 clear ; '; endrsubmit ; ' ; /*** MP Connect Control Section ***/ /* If MP Connect counter is greater than Max Processes then begin regulating number of concurrent processes. Example: If Max Processes = 4 then Job 5 will wait for Job 1, Job 6 will wait for Job 2, and Job y will wait for Job y - 4 (Max Processes) */ if x > &maxsesn then do; y = x - &maxsesn ; waitfor _any_ job' y ' ; ' ; signoff job' y ' ; ' ; /* If end of file (last variable processed) then create signoff statements for remaining processes. Number of remaining processes = total variables - (total variables - max processes) */ if eof then do ; remjob = y + 1 ; do i = remjob to x ; signoff job' i ' ; ' ; ' ; /*** End MP Connect Control Section ***/ %include pgm1; /* Include dynamic code for execution */ run; 287

Appendix End of Log Outputs i. Baseline One Process, No MP CONNECT ii. NOTE: SAS Institute Inc., SAS Campus Drive, Cary, NC USA 27513-2414 iii. NOTE: The SAS System used: iv. real time 8:55:40.31 v. cpu time 8:55:06.96 vi. 5 Processes, MP CONNECT vii. NOTE: SAS Institute Inc., SAS Campus Drive, Cary, NC USA 27513-2414 viii. NOTE: The SAS System used: ix. real time 1:38:06.88 x. cpu time 1:13.75 xi. 10 Processes, MP CONNECT xii. NOTE: SAS Institute Inc., SAS Campus Drive, Cary, NC USA 27513-2414 xiii. NOTE: The SAS System used: xiv. real time 54:32.13 xv. cpu time 1:13.82 xvi. 15 Processes, MP CONNECT xvii. NOTE: SAS Institute Inc., SAS Campus Drive, Cary, NC USA 27513-2414 xviii. NOTE: The SAS System used: xix. real time 38:55.23 xx. cpu time 1:17.90 xxi. 20 Processes, MP CONNECT xxii. NOTE: SAS Institute Inc., SAS Campus Drive, Cary, NC USA 27513-2414 xxiii. NOTE: The SAS System used: xxiv. real time 30:18.78 xxv. cpu time 1:20.13 xxvi. 25 Processes, MP CONNECT xxvii. NOTE: SAS Institute Inc., SAS Campus Drive, Cary, NC USA 27513-2414 xxviii. NOTE: The SAS System used: xxix. real time 25:20.33 xxx. cpu time 1:28.12 xxxi. 30 Processes, MP CONNECT xxxii. NOTE: SAS Institute Inc., SAS Campus Drive, Cary, NC USA 27513-2414 xxxiii. NOTE: The SAS System used: xxxiv. real time 21:34.87 xxxv. cpu time 1:31.36 xxxvi. 35 Processes, MP CONNECT xxxvii. NOTE: SAS Institute Inc., SAS Campus Drive, Cary, NC USA 27513-2414 xxxviii. NOTE: The SAS System used: xxxix. real time 19:29.24 xl. cpu time 1:34.45 xli. 40 Processes, MP CONNECT xlii. NOTE: SAS Institute Inc., SAS Campus Drive, Cary, NC USA 27513-2414 xliii. NOTE: The SAS System used: xliv. real time 18:09.77 xlv. cpu time 1:32.36 288