A simplistic approach to Grid Computing Edmonton SAS Users Group. April 5, 2016 Bill Benson, Enterprise Data Scienc ATB Financial

Similar documents
CHAPTER 7 Examples of Combining Compute Services and Data Transfer Services

Divide and Conquer Writing Parallel SAS Code to Speed Up Your SAS Program

Beating Gridlock: Parallel Programming with SAS Grid Computing and SAS/CONNECT

MIS Reporting in the Credit Card Industry

SAS 9 Boosts Performance with Parallel Capabilities of SAS/CONNECT

SAS/CONNECT for SAS Viya 3.3: User s Guide

Enhancing SAS Piping Through Dynamic Port Allocation

Systems Architecture. Paper

% BigDataLoader: A SAS Macro to Migrate Big Data 99% Faster

_beginmethod method 177 BINARY option, PROC DOWNLOAD statement. BINARY option, PROC UPLOAD statement

An Introduction to Parallel Processing with the Fork Transformation in SAS Data Integration Studio

Providing Users with Access to the SAS Data Warehouse: A Discussion of Three Methods Employed and Supported

SAS Institute Exam A SAS Advanced Programming Version: 6.0 [ Total Questions: 184 ]

SAS/Warehouse Administrator Usage and Enhancements Terry Lewis, SAS Institute Inc., Cary, NC

Preserving your SAS Environment in a Non-Persistent World. A Detailed Guide to PROC PRESENV. Steven Gross, Wells Fargo, Irving, TX

;... _... name; tsoge. scr purpose: Startup a TSO SAS session. notes: Assumes the TSO session has been logged on manually.

Getting the Most from Hash Objects. Bharath Gowda

SAS System Powers Web Measurement Solution at U S WEST

Best Practices for Using the SAS Scalable Performance Data Server in a SAS Grid environment

Principles of Automation

My SAS Grid Scheduler

Using Different Methods for Accessing Non-SAS Data to Build and Incrementally Update That Data Warehouse

Different Methods for Accessing Non-SAS Data to Build and Incrementally Update That Data Warehouse

PhUSE Eric Brinsfield, Meridian Analytics and d-wise, Virginia Beach, VA, USA Joe Olinger, d-wise, Morrisville, NC, USA

Coders' Corner. Scaling Mount GCHART: Using a MACRO to Dynamically Reset the Scale Nina L. Werner, Dean Health Plan, Inc., Madison, WI.

OVERVIEW OF THE SAS GRID

Encryption Services. What Are Encryption Services? Terminology. System and Software Requirements APPENDIX 2

Efficiently Join a SAS Data Set with External Database Tables

SAS Programmer s Guide to Life on the SAS Grid

Using an ICPSR set-up file to create a SAS dataset

The SERVER Procedure. Introduction. Syntax CHAPTER 8

SAS/CONNECT 9.2. User s Guide

Reporting Template. By Denis Fafard Business Analyst WCB - Alberta

Integrating Large Datasets from Multiple Sources Calgary SAS Users Group (CSUG)

10 The First Steps 4 Chapter 2

Using MDP Extensions. What Is the Multidimensional Data Provider? CHAPTER 3

Taking Advantage of the SAS System on the Windows Platform

Using SAS/SHARE More Efficiently

SAS Data Libraries. Definition CHAPTER 26

Extending the Scope of Custom Transformations

SAS ODBC Driver. Overview: SAS ODBC Driver. What Is ODBC? CHAPTER 1

What Is SAS? CHAPTER 1 Essential Concepts of Base SAS Software

Oracle Hyperion Profitability and Cost Management

Accelerate Your Data Prep with SASÂ Code Accelerator

Administration & Support

Beyond Proc GLM A Statistician's Perspective of (some of) The Rest of the SAS System

Paper SE04 Dynamic SAS Programming Techniques, or How NOT to Create Job Security Steven Beakley and Suzanne McCoy

Key Requirements for SAS Grid Users Paper

SAS ENTERPRISE GUIDE WHAT LIES BEHIND ALL THESE WINDOWS FOR PROGRAMMERS. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.

SAS Visual Analytics Environment Stood Up? Check! Data Automatically Loaded and Refreshed? Not Quite

Using SAS Files CHAPTER 3

EXAMPLE 3: MATCHING DATA FROM RESPONDENTS AT 2 OR MORE WAVES (LONG FORMAT)

APPENDIX 4 Migrating from QMF to SAS/ ASSIST Software. Each of these steps can be executed independently.

Tracking Dataset Dependencies in Clinical Trials Reporting

Presentation Goals. Now that You Have Version 8, What Do You Do? Top 8 List: Reason #8 Generation Data Sets. Top 8 List

A SAS Macro Utility to Modify and Validate RTF Outputs for Regional Analyses Jagan Mohan Achi, PPD, Austin, TX Joshua N. Winters, PPD, Rochester, NY

Dates. Saad Rais. Ministry of Health and Long-Term Care Nov 20, 2015

Introducing the SAS ODBC Driver

SAS Scalable Performance Data Server 4.45

Paper A Simplified and Efficient Way to Map Variable Attributes of a Clinical Data Warehouse

A Practical Introduction to SAS Data Integration Studio

Useful Tips When Deploying SAS Code in a Production Environment

Electricity Forecasting Full Circle

Storing and Reusing Macros

Bryan K. Beverly, UTA/DigitalNet

SAS Scalable Performance Data Server 4.47

From Manual to Automatic with Overdrive - Using SAS to Automate Report Generation Faron Kincheloe, Baylor University, Waco, TX

SeUGI 19 - Florence WEB Enabling SAS output. Author : Darryl Lawrence

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda

JUST PASSING THROUGH OR ARE YOU? DETERMINE WHEN SQL PASS THROUGH OCCURS TO OPTIMIZE YOUR QUERIES Misty Johnson Wisconsin Department of Health

Hyperion Interactive Reporting Reports & Dashboards Essentials

Using SAS Files CHAPTER 3

Paper HOW-06. Tricia Aanderud, And Data Inc, Raleigh, NC

The Output Bundle: A Solution for a Fully Documented Program Run

Using Cross-Environment Data Access (CEDA)

TRANSFERING SAS DATASETS BETWEEN THE PC AND THE MAINFRAME. James Ssemakula Computing Services Center Texas A&M University

SAS Scalable Performance Data Server 4.3

Paper SAS Managing Large Data with SAS Dynamic Cluster Table Transactions Guy Simpson, SAS Institute Inc., Cary, NC

Optimizing Testing Performance With Data Validation Option

ABSTRACT INTRODUCTION

SAS. Studio 4.1: User s Guide. SAS Documentation

Product: DQ Order Manager Release Notes

Administering SAS Enterprise Guide 4.2

A SAS Macro for Producing Benchmarks for Interpreting School Effect Sizes

Top Coding Tips. Neil Merchant Technical Specialist - SAS

Version 6 and Version 7: A Peaceful Co-Existence Steve Beatrous and James Holman, SAS Institute Inc., Cary, NC

OLAP Introduction and Overview

Using the SQL Editor. Overview CHAPTER 11

USING DATA TO SET MACRO PARAMETERS

Liberate, a component-based service orientated reporting architecture

Utilizing the Stored Compiled Macro Facility in a Multi-user Clinical Trial Setting

Paper CC16. William E Benjamin Jr, Owl Computer Consultancy LLC, Phoenix, AZ

A Legislative Bill Text Retrieval and Distribution System Using SAS, PROC SQL, and SAS/Access to DB2

PASS4TEST. IT Certification Guaranteed, The Easy Way! We offer free update service for one year

Shared File System Requirements for SAS Grid Manager. Table Talk #1546 Ben Smith / Brian Porter

ABSTRACT INTRODUCTION MACRO. Paper RF

Debugging. Where to start? John Ladds, SAS Technology Center, Statistics Canada.

Paper DB2 table. For a simple read of a table, SQL and DATA step operate with similar efficiency.

Technical Paper. Defining an OLEDB Using Windows Authentication in SAS Management Console

IaaS. IaaS. Virtual Server

Transcription:

A simplistic approach to Grid Computing Edmonton SAS Users Group April 5, 2016 Bill Benson, Enterprise Data Scienc ATB Financial

Grid Computing The Basics Points to Cover: Benefits of Grid Computing Server Environment Program Components The actual code Review the log(s) Benefits of Grid Computing: - higher productivity - run SAS jobs asynchronously - Grid Manager does all the hard work - readily schedule jobs off hours 2

EG6.1 & Base SAS (connect) Server Environment The SAS Server Meta Server sas01 Grid Manager sas02 SASData 5 node server Microsoft Server 2012 R2 15 TB server storage Compute sas03 Compute sas04 Compute sas05 768GB total memory (256GB / compute server) General Parallel File System (GPFS) DB2 SQL Server External SAS Libs 3

Program Tasks: - activate the grid manager - create remote sessions (signon) - specify selection criteria (%syslput) - submit jobs - connect to data repositories - process data based on selection criteria - wait for jobs to complete - end remote sessions (signoff) Parameters to pass to each grid session: - Input or source data location - Output or target data location - Criteria or data selection - main program 4

The Problem Create a monthly summary of average balance by product class over the last 3 months (need to process millions of records quickly ) Use the SAS grid manager to asynchronously run 3 remote sessions each session to extract and process 1 month of data. What s needed: - connect script to access DB2 data warehouse (2 schema & 3 tables) - libname for permanent SAS dataset (STAGING) - start and end dates for Jan 2016, Dec 2015 and Nov 2015 - main SAS program DB2 connect script & Staging libname assignment reside in COMMON directory Use the %SYSLPUT statement to pass date parameters to each remote session %SYSLPUT Statement Creates a single macro variable in the server session or copies a specified group of macro variables to the server session. Form 1: %SYSLPUT macro-variable=value </REMOTE=server-ID>; Form 2: %SYSLPUT _ALL_ _AUTOMATIC_ _GLOBAL_ _LOCAL_ _USER_ </LIKE= character-string ><REMOTE=server-ID>; 5

Grid Manager Process Schematic Jan 2016 SESS1 Running on SAS04 DB2 Dec 2015 SESS2 Running on SAS03 Staging Nov 2015 SESS3 Running on SAS04 Each remote session is independent with a unique WORK directory 6

*** Simplistic grid enabled program *** ; **** start the grid manager **** ; %let rc = %sysfunc(grdsvc_enable(_all_, resource=sasapp)); **** create the remote sessions **** ; signon sess1 signonwait=yes connectwait=no ; signon sess2 signonwait=yes connectwait=no ; signon sess3 signonwait=yes connectwait=no ; **** specify date criteria **** ; %syslput startdt = '01jan2016'd / remote=sess1; %syslput enddt = '31jan2016'd / remote=sess1; %syslput startdt = '01dec2015'd / remote=sess2; %syslput enddt = '31dec2015'd / remote=sess2; %syslput startdt = '01nov2015'd / remote=sess3; %syslput enddt = '30nov2015'd / remote=sess3; 7

Initial Log for signon and %syslput statements 801 %let rc = %sysfunc(grdsvc_enable(_all_, resource=sasapp)); 802 signon sess1 signonwait=yes connectwait=no ; NOTE: Remote session ID SESS1 will use the grid service _ALL_. NOTE: SIGNON request submitted to grid as job ID '99623'. NOTE: SIGNON request completed to grid host 'app-p-sas04.atb.ab.com'. 803 signon sess2 signonwait=yes connectwait=no ; NOTE: Remote session ID SESS2 will use the grid service _ALL_. NOTE: SIGNON request submitted to grid as job ID '99624'. NOTE: SIGNON request completed to grid host 'app-p-sas03.atb.ab.com'. 804 signon sess3 signonwait=yes connectwait=no ; NOTE: Remote session ID SESS3 will use the grid service _ALL_. NOTE: SIGNON request submitted to grid as job ID '99625'. NOTE: SIGNON request completed to grid host 'app-p-sas04.atb.ab.com'. 806 %syslput startdt = '01jan2016'd / remote=sess1; 807 %syslput enddt = '31jan2016'd / remote=sess1; 808 %syslput startdt = '01dec2015'd / remote=sess2; 809 %syslput enddt = '31dec2015'd / remote=sess2; 810 %syslput startdt = '01nov2015'd / remote=sess3; 811 %syslput enddt = '30nov2015'd / remote=sess3; 812 8

**** main program is a macro ****; %macro rollup(job) ; rsubmit &job ; filename common "G:\SASData\teams\ADA\00Pgm\common" ; %inc common(logonihub) ; /* DB2 connect script */ %inc common(stdlibs) ; /* STAGING libname */ data _null_ ; call symput('yymm',put(&enddt,yymmn4.)) ; proc copy in=work out=staging ; select account_rollup_&yymm ; endrsubmit ; %mend rollup ; ; proc sql ; create table account_rollup_&yymm as select c.prod_grp_desc_lg as product_class, max(a.cal_day) as mth_end_dt, count(a.cal_day) as days, mean(a.bal) as avg_mth_bal from SOC.SOC_ACCOUNT_ENDING_DAY_BAL_PRD_V as a, SOC.SOC_ACCOUNT_SCD_V as b, DMC.DMC_PRODUCT_DIM_T1_V as c where a.acct_oid = b.acct_oid and b.prod_fk = c.prod and b.curr_versn_flg = 'Y' and a.cal_day between &startdt and &enddt /* and a.acct_oid = (?????????) test only */ group by 1 ; quit ; 9

**** submit the jobs **** ; %rollup(sess1) %rollup(sess2) %rollup(sess3) **** wait for all jobs to finish **** ; waitfor _all_ sess1 sess2 sess3 ; **** combine outputs & make a report **** ; data final_rpt ; set staging.account_rollup_1511 staging.account_rollup_1512 staging.account_rollup_1601 ; proc print data=final_rpt ; **** end the remote sessions **** ; signoff sess1 ; signoff sess2 ; signoff sess3 ; 10

NOTE: Remote submit to REMHOST commencing. 249 %rollup(sess1) NOTE: Background remote submit to SESS1 in progress. 250 %rollup(sess2) NOTE: Background remote submit to SESS2 in progress. 251 %rollup(sess3) NOTE: Background remote submit to SESS3 in progress. 252 **** wait for ALL jobs to complete - then proceed ***** ; 253 waitfor _all_ sess1 sess2 sess3 ; 254 255 **** combine files and print a report **** ; 256 data final_rpt ; 257 set 258 staging.account_rollup_1511 259 staging.account_rollup_1512 260 staging.account_rollup_1601 261 ; 262 NOTE: There were 4 observations read from the data set STAGING.ACCOUNT_ROLLUP_1511. NOTE: There were 4 observations read from the data set STAGING.ACCOUNT_ROLLUP_1512. NOTE: There were 4 observations read from the data set STAGING.ACCOUNT_ROLLUP_1601. NOTE: The data set WORK.FINAL_RPT has 12 observations and 4 variables. 263 proc sort data=final_rpt ; 264 by product_class mth_end_dt ; 265 proc print data=final_rpt ; 266 title "Avg Balance Report" ; 267 Log after jobs submitted NOTE: There were 12 observations read from the data set WORK.FINAL_RPT. 11

Work library allocation Final Report 12

Log for ROLLUP(sess1) after signoff NOTE: Remote submit to SESS1 commencing. ROLLUP JOB sess1 1 filename common "G:\SASData\teams\ADA\00Pgm\common" ; 2 %inc common(logonihub) ; NOTE: Libref DMC was successfully assigned as follows: Engine: DB2 Physical Name: IHUBPRD1 NOTE: Libref SOC was successfully assigned as follows: Engine: DB2 Physical Name: IHUBPRD1 165 %inc common(stdlibs) ; NOTE: Libref STAGING was successfully assigned as follows: Engine: V9 Physical Name: G:\SASData\teams\ADA\data\staging 171 172 data _null_ ; 173 call symput('yymm',put(&enddt,yymmn4.)) ; 174 176 proc sql ; 177 create table account_rollup_&yymm as select c.prod_grp_desc_lg as product_class, max(a.cal_day) as mth_end_dt, 177! count(a.cal_day) as days, mean(a.bal) as avg_mth_bal from SOC.SOC_ACCOUNT_ENDING_DAY_BAL_PRD_V as a, SOC.SOC_ACCOUNT_SCD_V a 177! b, DMC.DMC_PRODUCT_DIM_T1_V as c where a.acct_oid = b.acct_oid and b.prod_fk = c.prod and b.curr_versn_flg = 'Y' and a.cal_day 177! between &startdt and &enddt and acct_oid in (???????????) group by 1 ; NOTE: Table WORK.ACCOUNT_ROLLUP_1601 created, with 4 rows and 4 columns. NOTE: PROCEDURE SQL used (Total process time): real time 0.09 seconds cpu time 0.01 seconds 179 proc copy in=work out=staging ; 180 select account_rollup_&yymm ; NOTE: Copying WORK.ACCOUNT_ROLLUP_1601 to STAGING.ACCOUNT_ROLLUP_1601 (memtype=data). NOTE: There were 4 observations read from the data set WORK.ACCOUNT_ROLLUP_1601. 13

Review: - Input (Source) data - Output (target) - criteria - main program Next steps: Grid computing can be applied to: - stable routine large scale applications - repeatable processing (monte-carlo, modelling) Move grid programs to the SAS Scheduling manager 14