Ten tips for efficient SAS code

Similar documents
Cheat sheet: Data Processing Optimization - for Pharma Analysts & Statisticians

Greenspace: A Macro to Improve a SAS Data Set Footprint

Working with Administrative Databases: Tips and Tricks

Surfing the SAS cache

Paper CC16. William E Benjamin Jr, Owl Computer Consultancy LLC, Phoenix, AZ

Optimizing System Performance

Using SAS to Analyze CYP-C Data: Introduction to Procedures. Overview

SOS (Save Our Space) Matters of Size

Ingo Brenckmann Jochen Kirsten Storage Technology Strategists SAS EMEA Copyright 2003, SAS Institute Inc. All rights reserved.

Top 10 Ways to Optimize Your SAS Code Jeff Simpson SAS Customer Loyalty

Chapter 6: Modifying and Combining Data Sets

SAS Scalable Performance Data Server 4.3

WEEK 3 TERADATA EXERCISES GUIDE

An Introduction to Compressing Data Sets J. Meimei Ma, Quintiles

Getting Started. 6 Save & Share. 1 Select the Dates. Choose a Starting Point. Choose the Context. Customize. Insert Metrics

Paper Quickish Performance Techniques for Biggish Data. Ryan Kumpfmiller, Ben Murphy, Jaime Thompson and Nick Welke; Zencos Consulting

Introduction to SAS Procedures SAS Basics III. Susan J. Slaughter, Avocet Solutions

Paper DB2 table. For a simple read of a table, SQL and DATA step operate with similar efficiency.

Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA

A SAS Macro for Producing Benchmarks for Interpreting School Effect Sizes

System Requirements. SAS Profitability Management 2.3. Deployment Options. Supported Operating Systems and Versions. Windows Server Operating Systems

Segregating Data Within Databases for Performance Prepared by Bill Hulsizer

SAS Performance Tuning Strategies and Techniques

Windows NT Server Configuration and Tuning for Optimal SAS Server Performance

Evolution of Database Systems

Supplementary Material for The Generalized PatchMatch Correspondence Algorithm

Sandor Heman, Niels Nes, Peter Boncz. Dynamic Bandwidth Sharing. Cooperative Scans: Marcin Zukowski. CWI, Amsterdam VLDB 2007.

Physical Level of Databases: B+-Trees

SAS File Management. Improving Performance CHAPTER 37

Introduction to SAS Procedures SAS Basics III. Susan J. Slaughter, Avocet Solutions

Batch Lazy Loader Pattern

TOP 10 (OR MORE) WAYS TO OPTIMIZE YOUR SAS CODE

In-Memory Data Management Jens Krueger

Leave Your Bad Code Behind: 50 Ways to Make Your SAS Code Execute More Efficiently.

SAS Styles ODS, Right? No Programming! Discover a Professional SAS Programming Style That Will Last a Career

A Side of Hash for You To Dig Into

SAS Scalable Performance Data Server 4.3 TSM1:

capabilities and their overheads are therefore different.

Data Set Options CHAPTER 2

Merge Sort Roberto Hibbler Dept. of Computer Science Florida Institute of Technology Melbourne, FL

Benchmark Macro %COMPARE Sreekanth Reddy Middela, MaxisIT Inc., Edison, NJ Venkata Sekhar Bhamidipati, Merck & Co., Inc.

Taking advantage of the SAS System on OS/390

Oracle Data Warehousing Pushing the Limits. Introduction. Case Study. Jason Laws. Principal Consultant WhereScape Consulting

Seagate Info. Objective. Contents. EPF File Generation for Page-on-Demand Viewing

Raima Database Manager Version 14.1 In-memory Database Engine

PERFORMANCE INVESTIGATION TOOLS & TECHNIQUES. 7C Matthew Morris Desynit

GETTING GREAT PERFORMANCE IN THE CLOUD

CS 147: Computer Systems Performance Analysis

SAS IT Resource Management Forecasting. Setup Specification Document. A SAS White Paper

Hash Objects for Everyone

SAS Programming Efficiency: Tips, Examples, and PROC GINSIDE Optimization

Tips & Tricks. With lots of help from other SUG and SUGI presenters. SAS HUG Meeting, November 18, 2010

An Oracle White Paper April 2010

Practical MySQL Performance Optimization. Peter Zaitsev, CEO, Percona July 02, 2015 Percona Technical Webinars

Guide Users along Information Pathways and Surf through the Data

A TALE OF TWO RELEASES: BENCHMARKING THE PERFORMANCE OF THE SAS SYSTEM RELEASE 6.06 AGAINST RELEASE 5.18

General Tips for Working with Large SAS datasets and Oracle tables

Indenting with Style

Built for Speed: Comparing Panoply and Amazon Redshift Rendering Performance Utilizing Tableau Visualizations

Best Practices. Deploying Optim Performance Manager in large scale environments. IBM Optim Performance Manager Extended Edition V4.1.0.

OASUS Spring 2014 Questions and Answers

Server Memory Allocation White Paper VERSION 7.1. Copyright 2016 Jade Software Corporation Limited. All rights reserved.

INTRODUCTION TO PROC SQL JEFF SIMPSON SYSTEMS ENGINEER

%DWFK$&&(66WR $'$%$6%$$ E\ 6WXDUW%LUFK IURP,QIRUPDWLRQ'HOLYHU\ 6\VWHPV6RXWK$IULFD

Practical Guide For Transformer in Production

SAS7BDAT Database Binary Format

Paper SAS Managing Large Data with SAS Dynamic Cluster Table Transactions Guy Simpson, SAS Institute Inc., Cary, NC

Table Compression in Oracle9i Release2. An Oracle White Paper May 2002

So Much Data, So Little Time: Splitting Datasets For More Efficient Run Times and Meeting FDA Submission Guidelines

Hammer Slide: Work- and CPU-efficient Streaming Window Aggregation

WAN Optimized Replication of Backup Datasets Using Stream-Informed Delta Compression

Gary L. Katsanis, Blue Cross and Blue Shield of the Rochester Area, Rochester, NY

Useful Tips from my Favorite SAS Resources

Craig S. Mullins. A DB2 for z/os Performance Roadmap By Craig S. Mullins. Database Performance Management Return to Home Page.

Using vmkusage to Isolate Performance Problems

Page Replacement. (and other virtual memory policies) Kevin Webb Swarthmore College March 27, 2018

HOW TO CHOOSE THE BEST MARKETING PRODUCT? zoho.com/campaigns

Automating Information Lifecycle Management with

BI-09 Using Enterprise Guide Effectively Tom Miron, Systems Seminar Consultants, Madison, WI

Filtering Data in SAS Enterprise Guide

Operating Systems. Memory Management. Lecture 9 Michael O Boyle

Advanced SQL Processing Prepared by Destiny Corporation

An Introduction to Big Data Formats

SAS Studio: A New Way to Program in SAS

Windows 10 Tips and Tricks

Estimate performance and capacity requirements for Access Services

Albis: High-Performance File Format for Big Data Systems

How To Rock with MyRocks. Vadim Tkachenko CTO, Percona Webinar, Jan

MySQL Performance Optimization and Troubleshooting with PMM. Peter Zaitsev, CEO, Percona Percona Technical Webinars 9 May 2018

Why is the CPU Time For a Job so Variable?

MS SQL Server 2016 Installation rev1704

Know Thy Data : Techniques for Data Exploration

The AVG 2015 Digital Diaries Executive Summary

Execution Architecture

The Three I s of SAS Log Messages, IMPORTANT, INTERESTING, and IRRELEVANT William E Benjamin Jr, Owl Computer Consultancy, LLC, Phoenix AZ.

Visualization and clusters: collaboration and integration issues. Philip NERI Integrated Solutions Director

Are Your SAS Programs Running You?

QLIKVIEW SCALABILITY BENCHMARK WHITE PAPER

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

OLAP Introduction and Overview

Transcription:

Ten tips for efficient SAS code

Host Caroline Scottow

Presenter Peter Hobart

Managing the webinar In Listen Mode Control bar opened with the white arrow in the orange box

Efficiency Overview Optimisation has four competing factors CPU Memory I/O (disk and network) Disk space

Basic principles: Don't do more work than you need to Optimise for your environment Go for the quick wins first - Jobs which take the most time - Jobs which are run most often Benchmark after each change Efficiency Overview

Efficiency Overview Data Integration Studio produces detailed information on resources used and time taken by each step In base SAS and SAS Enterprise Guide use options fullstimer to get more information on the log

Efficiency Overview NOTE: The data set WORK.SAMPLE2 has 317223 observations and 27 variables. NOTE: DATA statement used (Total process time): real time 2.10 seconds cpu time 0.28 seconds NOTE: The data set WORK.SAMPLE2 has 317223 observations and 27 variables. NOTE: DATA statement used (Total process time): real time 0.27 seconds cpu time 0.28 seconds NOTE: The data set WORK.SAMPLE2 has 317223 observations and 27 variables. NOTE: DATA statement used (Total process time): real time 0.27 seconds cpu time 0.28 seconds On the second and third runs the input data set was cached so the runtime was much lower. On a larger data set this effect will be reduced Note the CPU time remained constant

Simple: Reads every row, Outputs every 3 rd row Data sample1; if mod(_n_,3)=1; Efficiency Programmer time vs. Program time Both produce a 1/3 sample Efficient: Reads every 3 rd row and outputs Data sample2; do i = 1 to rows by 3; set orion.customer_orders point=i nobs=rows; output; end; stop;

Subset Tip 1 Subset early Subset early sale = quantity*retailprice; proc means data=profits nonobs maxdec=2; where month=12; class continent; var sale; where month=12; sale = quantity*retailprice; proc means data=profits nonobs maxdec=2; class continent; var sale;

Tip 2 select input columns with drop and keep where month=12; sale = quantity*retailprice; proc means data=profits nonobs maxdec=2; class continent; var sale; set orion.customer_orders (keep= month quantity retailprice continent); where month=12; sale = quantity*retailprice; proc means data=profits nonobs maxdec=2; class continent; var sale;

Tip 3 select output columns set orion.customer_orders (keep= month quantity retailprice continent); where month=12; sale = quantity*retailprice; proc means data=profits nonobs maxdec=2; class continent; var sale; data profits (keep=sale continent); set orion.customer_orders (keep= month quantity retailprice continent); where month=12; sale = quantity*retailprice; proc means data=profits nonobs maxdec=2; class continent; var sale;

Read everything then select sale = quantity*retailprice; if month=12; Tip 4 where vs IF Read only the required rows sale = quantity*retailprice; where month=12;

Read everything then select sale = quantity*retailprice; if sale > 200; Tip 4 where vs IF Read only the required rows sale = quantity*retailprice; where sale > 200; sale is calculated then evaluated sale does not exist in the source table But we could use where quantity*retailprice > 200

No index sale = quantity*retailprice; where month=12; Tip 5 Use indexes Index on month sale = quantity*retailprice; where month=12; NOTE: DATA statement used (Total process time): real time 2.43 seconds cpu time 0.26 seconds NOTE: DATA statement used (Total process time): real time 0.24 seconds cpu time 0.07 seconds

Tip 6 Build indexes efficiently Recreates the data and builds an index data orion.customer_orders (index=(month)); Reads the data and builds an index proc datasets lib=orion noprint; modify customer_orders; index create month; quit;

Tip 7 You can subset during a sort proc sort data=orion.customer_orders out=orders_sorted (drop=continent); by month; where continent="europe"; Danger! Sorting a data set without specifying out= can result in data loss

Tests every condition Tip 8 optimise Conditional logic Tests only until a condition is TRUE if quantity = 1 then order="sml"; if quantity = 2 then order="med"; if quantity = 3 then order="lrg"; if quantity > 3 then order="xxl"; if quantity = 1 then order="sml"; else if quantity = 2 then order="med"; else if quantity = 3 then order="lrg"; else if quantity > 3 then order="xxl"; All four tests are performed on every row If on the current row quantity=1 then only one test is performed

Subset Tip 9 Code with your data in mind Subset early if quantity = 1 then order="sml"; else if quantity = 2 then order="med"; else if quantity = 3 then order="lrg"; else if quantity > 3 then order="xxl"; if quantity = 2 then order="med"; else if quantity = 1 then order="sml"; else if quantity = 3 then order="lrg"; else if quantity > 3 then order="xxl"; Most orders in this data contain 2 items. By testing for this value first, less tests will be performed overall

Tip 10 Consider compression Compressing data: Reduces disk space used Increases CPU Decreases IO May introduce changes in behaviour in "edit in place" SAS default compression is deliberately "light touch" Long character variables will usually compress well Numbers will not compress with the default algorithm Compression introduces a space overhead: the compression achieved should outweigh this Look at the log to check the compression achieved Poor candidates for compression can grow instead Check bufsize is not excessively large, if compression is less than expected.

Tip 10 Consider compression data customer_orders (compress=yes); NOTE: Compressing data set WORK.CUSTOMER_ORDERS decreased size by 41.70 percent. Compressed is 3603 pages; un-compressed would require 6180 pages. NOTE: DATA statement used (Total process time): real time 2.57 seconds cpu time 1.59 seconds Compress = yes or compress= char uses run length encoding

Tip 10 Consider compression data customer_orders (compress=binary); NOTE: Compressing data set WORK.CUSTOMER_ORDERS decreased size by 46.21 percent. Compressed is 3324 pages; un-compressed would require 6180 pages. NOTE: DATA statement used (Total process time): real time 2.62 seconds cpu time 2.27 seconds Compress = binary uses Ross data compression - Slightly better compression - Slightly increased CPU

And finally Examine the log! Check the results are from a successful run, not a cached data set Every site has a different mix of constraints Measurements can be distorted by other running jobs competing for resources Benchmark changes by running several times in real - world situations Test each change in isolation

Questions?

Resources More information and sources of help SAS customer loyalty http://www.sas.com/en_gb/customer-loyalty.html Links to hundreds of free resources

Thankyou sas.com