Medical Research Laboratories, cincinnati, OH

Similar documents
Using Cross-Environment Data Access (CEDA)

Maximizing SAS Software Performance Under the Unix Operating System

8 Managing Quality Reporting Software

Q) Q) What is Linux and why is it so popular? Answer - Linux is an operating system that uses UNIX like Operating system...

NEUTRO Quick Start Guide. Version

Using LSF with Condor Checkpointing

QuickStart: Deploying DataSynapse GridServer 5.0 on HP Clusters

CS 307: UNIX PROGRAMMING ENVIRONMENT FIND COMMAND

This guide consists of the following two chapters and an appendix. Chapter 1 Installing ETERNUSmgr This chapter describes how to install ETERNUSmgr.

Creating a Patient Profile using CDISC SDTM Marc Desgrousilliers, Clinovo, Sunnyvale, CA Romain Miralles, Clinovo, Sunnyvale, CA

Using Data Transfer Services

Linux System Administration

Appendix A GLOSSARY. SYS-ED/ Computer Education Techniques, Inc.

Provides ability to perform UNIX commands at some time in the future. At time time on day day, the commands in filefile will be executed.

HP-UX System Administration

Introduction to UNIX Part II

Enhanced Printer Drivers for UNIX Systems

Agent Teamwork Research Assistant. Progress Report. Prepared by Solomon Lane

Operating Systems, Unix Files and Commands SEEM

SAS Strategy Management 5.2 Batch Maintenance Facility

Scheduling in SAS 9.4, Second Edition

Using Platform LSF Make

File Concept Access Methods Directory and Disk Structure File-System Mounting File Sharing Protection

CS Fundamentals of Programming II Fall Very Basic UNIX

Developing Cross-Platform Intranet Applications on UNIX

WHAT IS THE CONFIGURATION TROUBLESHOOTER?

Study Guide Processes & Job Control

Scheduling in SAS 9.2

Installation Instructions for SAS Foundation for UNIX Environments

Topaz for Java Performance Installation Guide. Release 16.03

Repro Station. User Manual

Using Rutgers University Power Macs

Accessing Hadoop Data Using Hive

HP AlphaServer Systems

Customer Support: North America (+1) ext. 6 [toll-free] Europe +49 (0)

G54ADM Sample Exam Questions and Answers

Installation Instructions for SAS Foundation for UNIX Environments

Technical Paper. Defining a Teradata Library with the TERADATA Engine in SAS Management Console

Bash Programming. Student Workbook

TotalView. Installation Guide. November 2004 version 6.6

Installation Guide V1.1

SAS Web Infrastructure Kit 1.0. Administrator s Guide

Hands-Off SAS Administration Using Batch Tools to Make Your Life Easier

SimDesigner for CATIA V5

Managing Groups Using InFellowship. A guide for Small Group Leaders

INSTALLING INSTALLING INSTALLING

HP-UX System Administration Course Overview. Skills Gained. Who will the Course Benefit?

Chapter 2. UNIX Installation. Node Locked vs Floating Network Licensing. Installation Procedure. Floating Network Keycodes. Node Locked Keycodes.

SGE Roll: Users Guide. Version 5.3 Edition

AcuConnect Versatile Remote COBOL Listener

Using Platform LSF with FLUENT

Installation and Maintenance Instructions for SAS 9.2 Installation Kit for Basic DVD Installations on z/os

22-Sep CSCI 2132 Software Development Lecture 8: Shells, Processes, and Job Control. Faculty of Computer Science, Dalhousie University

Xcalar Installation Guide

Chapter 2: Operating-System Structures. Operating System Concepts 9 th Edit9on

Informix Enterprise Command Center Installation Guide

2012 Microsoft Corporation. All rights reserved. Microsoft, Active Directory, Excel, Lync, Outlook, SharePoint, Silverlight, SQL Server, Windows,

A guide to assist dental or medical care facilities with the install of the DataGrabber, Apteryx s patented practice management bridging software.

LSF Make. Platform Computing Corporation

Everything about Linux User- and Filemanagement

Integra Codebase 4.2 SP1 Installation and Upgrade Guide

UNIT V. Dr.T.Logeswari. Unix Shell Programming - Forouzan

TEL2821/IS2150: INTRODUCTION TO SECURITY Lab: Operating Systems and Access Control

AIX Power System Assessment

Grid Engine Users Guide. 7.0 Edition

Qedit 5.7 for HP-UX. Change Notice. by Robelle Solutions Technology Inc.

Practical 5. Linux Commands: Working with Files

User s Guide for SAS Software Navigator

System Programming. Introduction to Unix

HP Operations Orchestration Software

Ibis RMI User s Guide

Real Time Clinical Trial Oversight with SAS

User's Guide - Master Schedule Management

mai Installation Instructions for SAS 9.4 Electronic Software Delivery for Basic Installations on z /OS

Overview of Unix / Linux operating systems

Contents. Note: pay attention to where you are. Note: Plaintext version. Note: pay attention to where you are... 1 Note: Plaintext version...

lsx [ls_options ] [names]

HIMSS and RSNA Integrating the Healthcare Enterprise IHE/MESA Cross Reference Consumer Tests

Lab 4 : Caching Locks. Introduction. Getting Started

MIMIX Availability. Version 7.1 MIMIX Operations 5250

Administrator for Enterprise Clients: User s Guide. Second Edition

SAS Model Manager 2.3

Chapter 2: System Structures. Operating System Concepts 9 th Edition

Configuring EMC Isilon

Developer Marketing: Learn UNIX Fundamentals through UNIX Computer-Based Training from NETg

Equitrac Office and Express 5.5 SUSE Linux iprint Server Guide

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture)

Verifying the Correctness of the PA 7300LC Processor

ADVANCED LINUX SYSTEM ADMINISTRATION

elinks, mail processes nice ps, pstree, top job control, jobs, fg, bg signals, kill, killall crontab, anacron, at

Using SAS to Control and Automate a Multi SAS Program Process Patrick Halpin, dunnhumby USA, Cincinnati, OH

Users Manual SX2. 2. Register this product host name and IP address to /etc/hosts file.

INSTALLING INSTALLING INSTALLING

Community Enterprise Operating System (CentOS 7) Courses

System Administration

Time Series Studio 12.3

Perforce Defect Tracking Gateway Guide

APPENDIX 4 Migrating from QMF to SAS/ ASSIST Software. Each of these steps can be executed independently.

Equitrac Office/Express. SUSE Linux OES2 iprint Server Guide Equitrac Corporation

Introduction to Cygwin Operating Environment

Transcription:

USING TASK BROKER TO ENHANCE SAS R BATCH PROCESSING IN A UNIX R ENVIRONMENT Lori Goss - Medical Research Laboratories, cincinnati, OH ABSTRACT In a production environment, it is quite often necessary to have numerous SAS R jobs competing simultaneously for computing resources. On a UNIX R network it is possible for the SAS Programmer to selectively request these resources. However, it is highly probable that optimum specifications on one day will be less than optimal on another day. It becomes an overwhelming task for a Programmer to monitor the computing resource fluctuations. Therefore, the need arises to dynamically select resources based upon the current load on computers on the network at any given time. Hewlett-Packard (hereafter known as HP) has developed a product, Task Broker, to aid in resolving this dilemma. Task Broker distributes a submitted task to the most capable computer on a network. This computer is selected based upon predetermined criteria such as its ability to perform the task (i.e. accessibility to software and data), its current load average and the task priority. When configured properly, this entire process is transparent to the user and will not overburden resources being utilized for interactive applications. This paper describes how to configure SAS, Task Broker, and the network to exploit all of the networks' resources during SAS batch processing. AUDIENCE This paper was written for use by Programmers and SAS System Administrators. It will instruct SAS Programmers to write code that can run under Task Broker. It will instruct Administrators to set up a Task Broker environment for SAS batch processing. This paper assumes understanding of network base SAS software, and operating system. INTRODUCTION a basic features, the UNIX Medical Research Laboratories (hereafter known as MRL) is a service organization functioning as the Central Laboratory for Clinical Trials. We receive patient samples in the morning and have made a commitment to analyze the samples and electronically transmit the results that same evening. While assessing our current and future computing needs at MRL, we determined that there would be peak processing times throughout each day. We wanted the ability to utilize all of our resources during these peaks. Due to the nature of our company operations, these peak hours would not be the same each day and would be data dependent. SAS batch jobs are executed by a shell script that is invoked when data is uploaded (via an ft.p file transfer) from a pc. The data originates from several different areas in the lab and the quantity of data is determined by the number of samples received that morning. Since the lab areas are not dependent on each other to upload data, the SAS batch jobs can be initiated at any time, as the lab is ready. Therefore any tools that we utilized to promote resource sharing would have to be dynamic. Prior to using Task Broker, data was uploaded onto the HP 9000, Model 835 computer (network nodename 'mrl hp'), and all SAS jobs started at the time of upload ran on the 835. But we have another faster CPU, an HP 9000, Model 375 computer (network node name 'hp375') that was not being utilized. We needed an automated method that would allow us to use both of the CPUs to process the uploaded data. 623

One of the options presented was to programmatically disburse the SAS jobs. This would necessitate that any shell scripts that were written would have to be modified each time a new SAS program needed to be added or removed. Each machine would have its own lists of jobs to be run at the time of each upload. So, any time new jobs were added, the System Administrator would have to determine which machine could best handle the jobs and assign the task accordingly. Since this happened quite often, that method could be rather cumbersome. It would also require extensive documentation to track where SAS programs were currently being executed. SYSTEM CONFIGURATION At MRL we have the following equipment for processing the data: HP 9000 Model 835 with 32 megabytes of memory and 2.2 gigabytes of disc space. HP 9000 model 375 with 32 megabytes of memory and 1.2 gigabytes of disc space. Data is fed to these from numerous PCs in our lab area. We are running SAS version 6.03 and 6.07 on both Hewlett Packard machines. We also have an 802.3 TCP/IP network linking the two machines. Both have ARPA/Berkeley Services and NFS products loaded onto them. With NFS it is possible to locally mount a remote file system across the network. REMOTE FILE SYSTEM ACCESS As an example, on nodename mrl hp, we have 3 file systems that we wanted to access from nodename hp375. Conversely, there was one file system on nodename hp375 that we wanted to access from nodename rnrl hp. NFS Services facilitates - this by allowing the System Administrator to mount the file system as type 'nfs'. So, when you perform any command on data in the remote system, the local operating system traverses the network to the remote file system and performs the function. This is all transparent to the user. If a file system on nodename hp375 is mounted as /data and you are logged into the hp375 then you can change to that directory by issuing the following command: cd /data Now, let's assume that you are logged into mrl_hp and want to access the /data directory on hp375. The System Administrator must first do the following to mount the file system on mrl_hp: 1. login as roo~ or become superuser with the 'su' command 2. cd / 3. mkdir /data 4. jete/mount hp375:/data /data -t nfs (This process can also be automated to occur at boot time.) Now you can issue the same command as when you were logged into hp375: cd /data The three file systems that contain data on mrl hp are also able to be mounted on hjp375 in such a manner so that they could be accessed the exact same way from both machines via 1ibname statements. For example, if there is a data set with an absolute path name of /data3/l.abdata.ssd, a libname statement can be issued from either machine such as: l.ibname data '/data3'; and accessed in a SAS program as: data.. 1.abdata from both machines. This concept is vital in establishing the SAS and Task Broker environment. This enables the SAS Programmer to write SAS code that will run on either machine without any changes for libnames. It also negates the need to physically move the file from one machine to another in order to allow a program access to that data. Task Broker Administrator enables to a _System "create a 624

cooperative computing environment in which computers not only perform services for each other but also intelligently distribute their workload. II (Task Broker Administrator's Guide, p. 1-2). SAS REQUIREMENTS On a UNIX system, there are certain requirements that must be met prior to invoking SASe For example, the user must have write access to the current working directory. This is necessary because SAS requires a sasuser sub-directory. If the sub-directory doesn't exist, it is created. The user must have read and write access to any directory (and necessary subdirectories) for all data sets that are to be modified. The user must also have the necessary permissions for any directory in which he/she wants to create a new data set. The CPUs must be configured as part of the network. Software, such as ARPA/Berkeley Services, NFS Services, etc, must be installed and configured, and all required remote file systems must be locally mounted. EXAMPLE OF STANDARD SAS BATCH JOB SUBMITTED When a SAS batch job is submitted without Task Broker, SAS will execute the program on the computer at which it was submitted. If there are any other jobs running on that computer, they will compete for work area on the disk, memory, and processing power. If numerous jobs are running concurrently, the execution time will increase and response time will deteriorate. If a network is in place, it would be possible to force the job to run on one of the other compatible computers on the network. However, this would require that the person submitting the job to manually determine the optimum CPU. He/She would have to know the CPU power, total memory available, and how many jobs are currently running on each CPU, etc.. As previously discussed this is not the most efficient mode of operation. HOW TO SET UP TASK BROKER FOR USE WITH SAS (This assumes that all CPUs are configured on the network, all network services have been installed and are running, all necessary remote file systems have been locally mounted, and SAS has been installed on all CPUs to be used). (Most of these steps must be executed as superuser. See the appropriate manual page for the su command). In a UNIX environment, each user must go through a login process. In a Bourne shell login, a _profile file is read upon login and defines the user's environment. It is necessary to add the SAS directory to the PATH variable in the.profile. If all users are to have access to SAS then it could be set in fete/profile which is read before the individual user's.. profile. Example: PATH~/bin:/usr/bin:/usr/local/bin: \ /usr/lib/sas export PATH. It is important to export the variable so that it is passed to any sub-shells. After Task Broker is installed (see Chapter 2 of the Task Broker Administrator's Manual), it is necessary to configure it to process a SAS batch submittal. This can be done by customizing the /users/tbroker/tbroker.eonf file. You must have one of the ~broker.conf files for each CPU. (See Figure 1 and Figure 2). In each of the sets of example scripts, the first example is for nodename mrl hp and the second is for nodename hp375. This file is used to define global options, clients, classes, and services. Figure 1 tbroker.conf for Model 835 GLOBAL ALLOW~(MRL HP hp375) - - ADMIN MACHINES~(MRL HP hp375) - - SUBNETMASK~193.50.149.255 VERBOSITY~9 625

Class EXAMPLE max number=4 EndClass service sas serv class=example affinity=500 uid = * gid = nice= 10 args=(/users/tbroker/ \ lib/sas.exec sas.exec) endservice Figure 2 tbroker.conf for Hodel 375 GLOBAL ALLOW=(HRL HP hp375) - - ADHIN HACHINES=(HRL HP hp37s) - - SUBNETHASK=193.50.149.255 VERBOSITY=9 Class EndClass EXAMPLE max number=4 service sas serv class=example affinity=500 uid = * gid = nice= 10 args=(/users/tbroker/ lib/sas.exec sas.exec) endservice Global options are defined for the local machine. The global options in Figure 1 and Figure 2 are the same since I want Task Broker on both CPUs to behave consistently. However as I add CPUs to the network, I mayor may not want them to handle Task Broker jobs. I can use the GLOBAL ALLOW and GLOBAL DENY parameters to either access or not access a CPU, respectively. The GLOBAL DENY variable is not shown in the example script.. I have defined both CPUs in ADMIN MACHINES in order to allow admintstration task commands to be issued from either CPU to one or both CPUs. For example, if I modify the tbroker.. conf file on nodename mrl_hp, I could issue the reconfig command in tadmin from node name hp37s to reconfigure nodename mrl hp. (~admin is a command used by-the System Administrator to maintain and/or modify the Task Broker configuration. reconfig is a function of tadmin.) I have set VERBOSITY to 9 in order to get as much information from Task Broker as I can when a job is being submitted. This information is written to a log file in the Task Broker home directory. One of the best tools available for debugging a problem with the Task Broker setup is the Task Broker log. With the VERBOSITY level at 9 in the tbroker.conf file, you will get more detailed information about the Task Broker logic flow than if VERBOSITY is O. When all of the bugs have been worked out, VERBOSITY could be set to a lower level to reduce the size of the log file. As with most components of Task Broker, the user has the ability to set varying levels of complexity. The Class EXAMPLE has a very brief definition. It simply states that the class will be available. This is necessary because every service must reference a class. The parameter max number=4, indicates that a maximum of-4 services, of this Class, can process simultaneously. In the service sas serv definition, the class is set to-example. I allow all users access to it by setting uid and gid, user id and group id, equal to * The affinity is set to 500 (must be in range from 0-999) so that one CPU doesn't take preference over the other when assigning tasks. In this way, the job will be given to the CPU that is best able to handle the task as determined by Task Broker (not by me, the Administrator). The nice parameter functions in the same manner as the UNIX nice command. The UNIX nice command executes the task at a lower priority, thereby reducing the toll on the processing resources. The args parameter tells Task Broker how to execute this service. I am instructing Task Broker to use the shell script sas.exec with the full path name. If a shell script is 626

listed, it is one that the Administrator must write. (See Figure 3 and Figure 4). This will be explained in more detail later. Figure 3!jbin/sh sas.exec for Model 835 jsas/sas $SAS_OPTS Figure 4 I/binjsh sas.exec for Model 375 /usrjlib/sas/sas $SAS_OPTS There are other global parameters that can be set that I have chosen not to use. They allow even greater control over Task Broker. I have only discussed the ones that are being utilized at MRL. Figures 5 and 6 list the script files that I wrote to submit a SAS batch job to Task Broker. Figure 5!/bin/sh tbsas for Model 835 eval SASPG=\$$ SASPG=$SASPG.sas SAS OPTS="$SASPG" tsub -b -s \ sas serv -0 /tmp/'whoami'.out \ -e 7tmpj'whoami'.err /dev/null Figure 6!/bin/sh tbsas for Model 375 eval SASPG=\$$ SASPG=$SASPG.sas SAS OPTS="$SASPG" tsub -b -5 \ sas serv -0 /tmp/'whoami'.out \ -e 7tmp/'whoami'.err /devjnull Figure 7 shows a script segment which submits numerous SAS jobs to Task Broker for processing. Figure 7 /data2/lipdnew /data2/chemnew /data3/drugco/eligib /data3/drugco/creatdb Each line in Figure 7 executes the tbsas script program and passes the argument which is the SAS program to be run in batch mode. The eval SASPG=\$$ line in the tbsas (Figures 5 and 6) shell script sets the variable SASPG equal to the SAS program name argument which was passed to it. The following three lines in t;bsas are, at first glance, confusing, but when broken down are understandable: SAS OPTS="$SASPG" tsub -b -s sas serv -0 /tmp/'whoami'.out; -e 7tmp/'whoami'.err jdevjnull The translation is as follows: Assign the value of SASPG to SAS OPTS then immediately execute the -tsub command with the options listed. (In UNIX when assigning a value to a variable and executing a command in the same line, UNIX exports the variable, automatically, before starting the subshell). The-b option tells Task Broker that it is being called from a shell script. 627

The -s Bas sexy names the Task Broker service that is to be used. The \ tells the shell that this command continues on the next line by escaping the newline character. Code fragment -0 /t;mp/ 'whoami'.out directs standard out to the specified file. This file will have the name that is returned from the whoami command. So if the user is 'lori' then the file is named /emp/lori.out;. The code fragment -e /tmp/ 'whoami' err directs standard error to the specified file. The file name is derived in the same manner as standard out. Again the \ continues the command on the next line. Code fragment /dev/null is the last item and it indicates the data file to be used. Task Broker assumes standard input if this field is omitted. However, since the data is not coming from standard input, /dev/null is specified. The SAS program is coded to access any data it requires. /dev/null is a special file in the UNIX system that anyone can read from and get an irrunediate end of file. When this command executes, Task Broker will read the t;broker.conf file and look for the sas serv service. When found it will see-that it is to run the sas.exec script. See Figure 3 and Figure 4. The sas.exec script is used to tailor the SAS call to the system in which it will run. For example, in Figure 3 the absolute path to the aas executable is /sas/sas. However in Figure 4 it is /usr/lib/sas/sas. This indicates to Task Broker exactly where it is to find the SAS executable. These are different because of the way that SAS was originally installed on each machine. Therefore the sas.exec file must be appropriately tailored to recognize any system's dependencies. WHAT HAPPENS WHEN A JOB IS SUBMITTED TO TASK BROKER The Task Broker client is the computer that has a job that it wants processed. The Task Broker server is the computer that can process the job. When a job is submitted, the Task Broker client's daemon asks each server able to service the job, its ability to handle the job. Each server responds with its current CPU load, its affinity for accepting the job, the maximum service of the type requested, and the number of jobs that it is currently running. The client compares each server's response and sends the job to the most able server ~ The server processes the requests and notifies the client when it is complete. For example: Client A has a SAS job that needs to be run. Server X and Server Yare both capable of servicing the request, but one is more able than the other. Server X can run one job for the serv ice at a time and is currently running one. Server Y can also only run one task for the service at a time but is not running any. So Server Y will notify Client A that it will accept the job. When Server Y has completed the job it tells Client A. If both servers had been busy, the SAS job that was submitted would have waited in the Task Broker queue until one of the servers was ready to accept another task. If one of the servers has a faster CPU then it will naturally process the jobs faster than the other server and can therefore accept more requests. So, if job 1, job 2, and job 3 need to be run and job 2 requires data sets created by job 1 then it would not be wise to submit them to Task Broker. In this scenario, if both servers are available, job 1 would go to Server X and job 2 would go to Server Y. Both jobs are then running simultaneously and job 2 would not have the data sets that it needs. One option around this is to link the 3 programs by using a %inc statement in the first program to call the other 2 programs. Then only the first program name should be submitted to Task Broker. 628

IMPROVED PERFORMANCE What Task Broker has done for MRL is allow our SAS batch jobs to execute more than twice as fast. We originally had all of the jobs running on the HP 9000 Model 835 machine which has a 12 mips (million instructions per second) processor. Now/, via Task Broker, many of the jobs process on the HP 9000 Model 375 which has a 25 mips processor. CONCLUSION TRADEMARK ACKNOWLEDGEMENTS SAS is a registered trademark or trademark of SAS Institute Inc. in the USA and other countries. R indicates USA registration. Unix is a registered trademark of AT&T in the USA and in other countries. NFS is a trademark of Sun Microsystems, Inc. Other brand and product names are registered trademarks or trademarks of their respective companies. I have found Task Broker to be an extremely useful and vital tool. Once set up, it has required little or no maintenance. I recently added a service for version 6.07 SAS programs. It took only minutes to set up the necessary scripts and configure Task Broker for the new service, and an hour or so to fully test it. SAS software and Task Broker complement each other nicely in a UNIX cooperative computing environment. REFERENCES Task Broker Sunnyvale, Company. Administrator's Guide, CA: Hewlett-Packard Task Broker User's Guide, Sunnyvale, CA: Hewlett-Packard Company. Kochan, S.G. and Wood, P.H. UNIX R Shell Programming, Edition, Carmel, IN: Hayden A Division of Howard W. Company. (1990), Revised Books - Sams & AUTHOR CONTACT Lori Goss c/o Medical Research Laboratories 2350 Auburn Avenue Cincinnati, OH 45219 (513) 579-9046 629