Working with Administrative Databases: Tips and Tricks

Similar documents
Green Eggs And SAS. Presented To The Edmonton SAS User Group October 24, 2017 By John Fleming. SAS is a registered trademark of The SAS Institute

A Practical Approach to Process Improvement Using Parallel Processing

Ten tips for efficient SAS code

ERROR: The following columns were not found in the contributing table: vacation_allowed

Optimizing System Performance

Paper CC16. William E Benjamin Jr, Owl Computer Consultancy LLC, Phoenix, AZ

The NIH Collaboratory Distributed Research Network: A Privacy Protecting Method for Sharing Research Data Sets

General Tips for Working with Large SAS datasets and Oracle tables

SAS Online Training: Course contents: Agenda:

How to Monitor Your DAD and/or NACRS Data Submissions

Lab #9: ANOVA and TUKEY tests

Data Quality Assessment Tool for health and social care. October 2018

PROC FORMAT: USE OF THE CNTLIN OPTION FOR EFFICIENT PROGRAMMING

Detecting Outliers in Column Profile Results in Informatica Analyst

How to write ADaM specifications like a ninja.

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

SAS File Management. Improving Performance CHAPTER 37

SAS Performance Tuning Strategies and Techniques

If You Need These OBS and These VARS, Then Drop IF, and Keep WHERE Jay Iyengar, Data Systems Consultants LLC

CITI PROGRAM NEW LEARNER ACCOUNT REGISTRATION

Chapter 6: Modifying and Combining Data Sets

Common Sense Tips and Clever Tricks for Programming with Extremely Large SAS Data Sets

FINNISH APPROACH TO CRITICAL INFRASTRUCTURE PROTECTION

MDM 4UI: Navigating and Using the Statistics Canada Website

Critical Information Infrastructure Protection Law

The Pan-Canadian Real-world Health Data Network (PRHDN)

SAS Programming Efficiency: Tips, Examples, and PROC GINSIDE Optimization

Performance Considerations

CSE101: Design and Analysis of Algorithms. Ragesh Jaiswal, CSE, UCSD

Effective ways of handling various file types and importing techniques using SAS 9.4

Greenspace: A Macro to Improve a SAS Data Set Footprint

%DWFK$&&(66WR $'$%$6%$$ E\ 6WXDUW%LUFK IURP,QIRUPDWLRQ'HOLYHU\ 6\VWHPV6RXWK$IULFD

Teammate Self-Service

Choosing the Right Technique to Merge Large Data Sets Efficiently Qingfeng Liang, Community Care Behavioral Health Organization, Pittsburgh, PA

Cleaning up your SAS log: Note Messages

Ivy s Business Analytics Foundation Certification Details (Module I + II+ III + IV + V)

Data security statement Volunteers

SOS (Save Our Space) Matters of Size

Comparison of different ways using table lookups on huge tables

SPSS TRAINING SPSS VIEWS

In your school or local public library, log on to the library catalogue.

Using PROC PLAN for Randomization Assignments

Using PROC SQL to Calculate FIRSTOBS David C. Tabano, Kaiser Permanente, Denver, CO

ACHOO - THE FLU, SAS & YOU

Blackout 2003 Reliability Recommendations

COLUMN DATABASES A NDREW C ROTTY & ALEX G ALAKATOS

Flex Program Guide: Using MBQIP Excel Files May 2017

Data Edit-checks Integration using ODS Tagset Niraj J. Pandya, Element Technologies Inc., NJ Vinodh Paida, Impressive Systems Inc.

Data Representation. Types of data: Numbers Text Audio Images & Graphics Video

The Building Blocks of SAS Datasets. (Set, Merge, and Update) Andrew T. Kuligowski FCCI Insurance Group

APPENDIX 3 Tuning Tips for Applications That Use SAS/SHARE Software

Paper DB2 table. For a simple read of a table, SQL and DATA step operate with similar efficiency.

The network marketing industry has grown by ninety percent during the last ten years as reported by the Direct Sellers Association.

MRR (Multi Resolution Raster) Revolutionizing Raster

50 WAYS TO MERGE YOUR DATA INSTALLMENT 1 Kristie Schuster, LabOne, Inc., Lenexa, Kansas Lori Sipe, LabOne, Inc., Lenexa, Kansas

Working with Composite Endpoints: Constructing Analysis Data Pushpa Saranadasa, Merck & Co., Inc., Upper Gwynedd, PA

UCB CALSTAPH EXCEL AND EPIDEMIOLOGY

Multi-Threaded Reads in SAS/Access for Relational Databases Sarah Whittier, ISO New England, Holyoke, MA

Cheat sheet: Data Processing Optimization - for Pharma Analysts & Statisticians

Planning for disaster recovery in a health care setting

TIPS AND TRICKS: IMPROVE EFFICIENCY TO YOUR SAS PROGRAMMING

Lecture 8 Wireless Sensor Networks: Overview

Programming Beyond the Basics. Find() the power of Hash - How, Why and When to use the SAS Hash Object John Blackwell

UNCLASSIFIED. National and Cyber Security Branch. Presentation for Gridseccon. Quebec City, October 18-21

dtalink Faster probabilistic record linking and deduplication methods in Stata for large data files Keith Kranker

Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA

EEE 435 Principles of Operating Systems

Anonymization Case Study 1: Randomizing Names and Addresses

Week 6, Week 7 and Week 8 Analyses of Variance

BORN Ontario s Data Quality Framework

CS 525: Advanced Database Organization 04: Indexing

System Requirements. SAS Profitability Management 2.3. Deployment Options. Supported Operating Systems and Versions. Windows Server Operating Systems

R commander an introduction

Mia Stephens JMP Academic Ambassador, SAS, NC

Disk Subsystem Capacity Management, Based on Business Drivers, I/O Performance Metrics and MASF. Igor Trubin, Ph.D. and Linwood Merritt

Locate the patent portfolio of interest

SAS Scalable Performance Data Server 4.3

IBM InfoSphere Data Replication s Change Data Capture (CDC) for DB2 LUW databases (Version ) Performance Evaluation and Analysis

Navigate to Financial Management > Vendors > Setup >Configuration > Custom Forms Setup.

SIMULATING SECURE DATA EXTRACTION IN EXTRACTION TRANSFORMATION LOADING (ETL) PROCESSES

Excel Training - Beginner March 14, 2018

Sage Canadian SMB Survey on Mobile Devices March 2013

EBOOK 4 TIPS FOR STRENGTHENING THE SECURITY OF YOUR VPN ACCESS

PharmaSUG Paper TT11

PDF // TUTSPLUS WEB DESIGN DOCUMENT

So Much Data, So Little Time: Splitting Datasets For More Efficient Run Times and Meeting FDA Submission Guidelines

Collective Mind. Early Warnings of Systematic Failures of Equipment. Dr. Artur Dubrawski. Dr. Norman Sondheimer. Auton Lab Carnegie Mellon University

An Introduction to Analysis (and Repository) Databases (ARDs)

Information Retrieval

Run Search and Export Results in Derwent Innovation

Memory Management! How the hardware and OS give application pgms:" The illusion of a large contiguous address space" Protection against each other"

Simple Rules to Remember When Working with Indexes

A detailed comparison of EasyMorph vs Tableau Prep

An Introduction to the WERS-REPONSE Stata dataset. Version 1.0 (May 2016)

A SAS and Java Application for Reporting Clinical Trial Data. Kevin Kane MSc Infoworks (Data Handling) Limited

Introduction to SharePoint 2016 for Collaboration and Document Management

How Managers and Executives Can Leverage SAS Enterprise Guide

INTRODUCTION TO SAS HOW SAS WORKS READING RAW DATA INTO SAS

Software Testing and Maintenance 1

Definitive Healthcare Training Guide

Transcription:

3 Working with Administrative Databases: Tips and Tricks Canadian Institute for Health Information Emerging Issues Team Simon Tavasoli

Administrative Databases > Administrative databases are often used to synthesize information regarding health care system or to investigate health research questions > The data may be derived from population registries, vital statistics or other records of life events, or from health claims and services data > Canadian Institute for Health Information (CIHI), collect /receives essential data and prepares analyses on Canada s health system and the health of Canadians > Currently CIHI holds more than 27 databases with millions of Record (e.g. National Ambulatory Care Registry contains millions of records each year) 3

Working with Administrative Databases: General Tips and Tricks > Each day hundreds of employees conduct analyses using SAS > Given the magnitude of work load on the CIHI server, using resources wisely is important There is always a trade-off > Efficiency can be measured in many ways Real Time CPU time Memory Input /Output Original Programmer time Maintenance Programmer time 3

System Options for measure of performance > Options STIMER; (Default ) NOTE: DATA statement used: real time 1.16 seconds cpu time 0.09 seconds > Options FULLSTIMER; NOTE: The SAS System used: real time 0.14 seconds user cpu time 0.01 seconds system cpu time 0.05 seconds Memory 1452k Page Faults 1 Page Reclaims 2349 Page Swaps 0 Voluntary Context Switches 53 Involuntary Context Switches 5 Block Input Operations 1 Block Output Operations 0 4

Optimizing performance * Optimize performance by reducing CPU time -Check the program using the _null_ or the OBS -Use WHERE vs. IF -Use DROP and KEEP statements -Issues with merging data -Avoid unnecessary DATA steps or sorting -Manipulation of data with IF/THEN/ELSE statements -Dealing with resource intensive calculations *Keep the libraries clean *Reduce the size of the tables using COMPRESS=YES 5

When checking your programs, use a null data set or limit the number of observations 6

Subsetting Datasets: WHERE vs. IF statements 7

Process only the variables that you need Need only two variables Social Sciences computing cooperative 8

Subsetting datasets 9

Subsetting datasets: KEEP Statement 10

Subsetting datasets: KEEP Statement 11

Subsetting datasets: KEEP Statement 12

Some other Shortcuts 13

Merging data 14

Merging data 15

When only one condition can be true for a given observation, write a series of IF-THEN/ELSE statements. Social Sciences computing cooperative 16

When only one condition can be true for a given observation, write a series of IF-THEN/ELSE statements. 17

When only one condition can be true for a given observation, write a series of IF-THEN/ELSE statements. 18

Perform resource-intensive calculations and comparisons only once Social Sciences computing cooperative 19

Assign many values in one statement Social Sciences computing cooperative 20

Dealing with Missing Values Put missing values last in expressions Check for missing values before using a variable in multiple statements. Social Sciences computing cooperative 21

Avoid unnecessary sorting 22

If several different subsets are needed, avoid rereading the data for each subset 23

Keep your SAS environment clean 24

COMPRESS= 25