Paper CT-16 Manage Hierarchical or Associated Data with the RETAIN Statement Alan R. Mann, Independent Consultant, Harpers Ferry, WV

Similar documents
A Simple Framework for Sequentially Processing Hierarchical Data Sets for Large Surveys

An Efficient Method to Create Titles for Multiple Clinical Reports Using Proc Format within A Do Loop Youying Yu, PharmaNet/i3, West Chester, Ohio

Oracle BI 12c: Build Repositories

ABSTRACT MORE THAN SYNTAX ORGANIZE YOUR WORK THE SAS ENTERPRISE GUIDE PROJECT. Paper 50-30

Sorting big datasets. Do we really need it? Daniil Shliakhov, Experis Clinical, Kharkiv, Ukraine

Journey to the center of the earth Deep understanding of SAS language processing mechanism Di Chen, SAS Beijing R&D, Beijing, China

Programming Beyond the Basics. Find() the power of Hash - How, Why and When to use the SAS Hash Object John Blackwell

Git with It and Version Control!

A SAS Solution to Create a Weekly Format Susan Bakken, Aimia, Plymouth, MN

Countdown of the Top 10 Ways to Merge Data David Franklin, Independent Consultant, Litchfield, NH

Paper CC16. William E Benjamin Jr, Owl Computer Consultancy LLC, Phoenix, AZ

A Practical Introduction to SAS Data Integration Studio

Quality Control of Clinical Data Listings with Proc Compare

SESUG 2014 IT-82 SAS-Enterprise Guide for Institutional Research and Other Data Scientists Claudia W. McCann, East Carolina University.

MIS Reporting in the Credit Card Industry

Oracle BI 11g R1: Build Repositories

Paper ###-YYYY. SAS Enterprise Guide: A Revolutionary Tool! Jennifer First, Systems Seminar Consultants, Madison, WI

SAS ENTERPRISE GUIDE USER INTERFACE

Cleaning up your SAS log: Note Messages

PharmaSUG 2013 CC26 Automating the Labeling of X- Axis Sanjiv Ramalingam, Vertex Pharmaceuticals, Inc., Cambridge, MA

Lasso Your Business Users by Designing Information Pathways to Optimize Standardized Reporting in SAS Visual Analytics

Producing Summary Tables in SAS Enterprise Guide

WHAT IS THE CONFIGURATION TROUBLESHOOTER?

CitiManager Quick Start Guide for Program Administrators. September Bank Handlowy w Warszawie S.A.

SAS File Management. Improving Performance CHAPTER 37

Record-Level Access: Under the Hood

Merging Data Eight Different Ways

WHAT ARE SASHELP VIEWS?

Oracle BI 11g R1: Build Repositories Course OR102; 5 Days, Instructor-led

BI-09 Using Enterprise Guide Effectively Tom Miron, Systems Seminar Consultants, Madison, WI

Hash Objects for Everyone

Losing Control: Controls, Risks, Governance, and Stewardship of Enterprise Data

SAS IT Resource Management 3.3

Are Your SAS Programs Running You? Marje Fecht, Prowerk Consulting, Cape Coral, FL Larry Stewart, SAS Institute Inc., Cary, NC

Oracle BI 11g R1: Build Repositories

9 Ways to Join Two Datasets David Franklin, Independent Consultant, New Hampshire, USA

Query Optimization. Query Optimization. Optimization considerations. Example. Interaction of algorithm choice and tree arrangement.

SAS Programming Techniques for Manipulating Metadata on the Database Level Chris Speck, PAREXEL International, Durham, NC

CDISC Variable Mapping and Control Terminology Implementation Made Easy

Personally Identifiable Information Secured Transformation

System Requirements. SAS Profitability Management 2.3. Deployment Options. Supported Operating Systems and Versions. Windows Server Operating Systems

Regaining Some Control Over ODS RTF Pagination When Using Proc Report Gary E. Moore, Moore Computing Services, Inc., Little Rock, Arkansas

Call: SAS BI Course Content:35-40hours

Basic SAS Hash Programming Techniques Applied in Our Daily Work in Clinical Trials Data Analysis

Paper AD12 Using the ODS EXCEL Destination with SAS University Edition to Send Graphs to Excel

Locking SAS Data Objects

Correcting for natural time lag bias in non-participants in pre-post intervention evaluation studies

Guide Users along Information Pathways and Surf through the Data

It s not the Yellow Brick Road but the SAS PC FILES SERVER will take you Down the LIBNAME PATH= to Using the 64-Bit Excel Workbooks.

SAS Data Libraries. Definition CHAPTER 26

Quicker Than Merge? Kirby Cossey, Texas State Auditor s Office, Austin, Texas

PROC FORMAT: USE OF THE CNTLIN OPTION FOR EFFICIENT PROGRAMMING

SAS Data Integration Studio 3.3. User s Guide

Simple Rules to Remember When Working with Indexes

Preserving your SAS Environment in a Non-Persistent World. A Detailed Guide to PROC PRESENV. Steven Gross, Wells Fargo, Irving, TX

Not Just Merge - Complex Derivation Made Easy by Hash Object

Paper SAS Taming the Rule. Charlotte Crain, Chris Upton, SAS Institute Inc.

An Interactive GUI Front-End for a Credit Scoring Modeling System

Summarizing Impossibly Large SAS Data Sets For the Data Warehouse Server Using Horizontal Summarization

DATA Step in SAS Viya : Essential New Features

Using PROC PLAN for Randomization Assignments

ADD/EDIT VENDOR. 1. To add a new vendor to the system from within the Accounts Payable module, navigate to: Accounts Payable Vendors.

National Accounts for Microsoft Dynamics SL Installation and User s Guide

National Accounts for Microsoft Dynamics SL 2011 Installation and User s Guide

Exporting Variable Labels as Column Headers in Excel using SAS Chaitanya Chowdagam, MaxisIT Inc., Metuchen, NJ

Best ETL Design Practices. Helpful coding insights in SAS DI studio. Techniques and implementation using the Key transformations in SAS DI studio.

Paper William E Benjamin Jr, Owl Computer Consultancy, LLC

Is Your Data Viable? Preparing Your Data for SAS Visual Analytics 8.2

Data Manipulations Using Arrays and DO Loops Patricia Hall and Jennifer Waller, Medical College of Georgia, Augusta, GA

Quick and Efficient Way to Check the Transferred Data Divyaja Padamati, Eliassen Group Inc., North Carolina.

Using PROC REPORT to Cross-Tabulate Multiple Response Items Patrick Thornton, SRI International, Menlo Park, CA

Computer Programming II C++ (830)

An Animated Guide: Proc Transpose

Anatomy of a Merge Gone Wrong James Lew, Compu-Stat Consulting, Scarborough, ON, Canada Joshua Horstman, Nested Loop Consulting, Indianapolis, IN, USA

Liferay User Management. Kar Joon Chew Oct 2011

Automatic Detection of Section Membership for SAS Conference Paper Abstract Submissions: A Case Study

Create a Format from a SAS Data Set Ruth Marisol Rivera, i3 Statprobe, Mexico City, Mexico

Chapter 6: Modifying and Combining Data Sets

Beginning Tutorials. PROC FSEDIT NEW=newfilename LIKE=oldfilename; Fig. 4 - Specifying a WHERE Clause in FSEDIT. Data Editing

Managing Your Website with Convert Community. My MU Health and My MU Health Nursing

SESUG Paper SD ETL Load performance benchmarking using different load transformations in SAS Data Integration Studio.

From Manual to Automatic with Overdrive - Using SAS to Automate Report Generation Faron Kincheloe, Baylor University, Waco, TX

Two-Machine Deployment of SAS Office Analytics 7.4

Hello World! Getting Started with the SAS DS2 Language

Are you Still Afraid of Using Arrays? Let s Explore their Advantages

How to Implement the One-Time Methodology Mark Tabladillo, Ph.D., MarkTab Consulting, Atlanta, GA Associate Faculty, University of Phoenix

K-means and Hierarchical Clustering

IT1105 Information Systems and Technology. BIT 1 ST YEAR SEMESTER 1 University of Colombo School of Computing. Student Manual

Copyright Basware Corporation. All rights reserved. User Guide Basware Network

Facet Folders: Flexible Filter Hierarchies with Faceted Metadata

SYSTEM 2000 Essentials

Improving Your Relationship with SAS Enterprise Guide Jennifer Bjurstrom, SAS Institute Inc.

SAS Scalable Performance Data Server 4.3

Delivering Information to the People Who Need to Know Carol Rigsbee, SAS Institute Chris Hemedinger, SAS Institute

Vendor Manual Registration. Vendor Manual Registration. Page 1 of 10

Data Presentation ABSTRACT

Delivering Information to the People Who Need to Know Carol Rigsbee, SAS Institute Chris Hemedinger, SAS Institute

Macros from Beginning to Mend A Simple and Practical Approach to the SAS Macro Facility

Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA

PharmaSUG China Paper 059

Transcription:

Paper CT-16 Manage Hierarchical or Associated Data with the RETAIN Statement Alan R. Mann, Independent Consultant, Harpers Ferry, WV ABSTRACT For most of the history of computing machinery, hierarchical data has existed, and will undoubtedly persist through the next several decades. Lining up parent and child relationships by several key fields can be a challenge, and in most cases, could be served by joins or merges. This short presentation will show SAS programmers and analysts of all levels a trick to line up parent and child data using an associated key and ultimately arranging them via a surrogate key using the RETAIN statement in a data step. What perplexes many programmers is where to place the RETAIN and the logic to make use of it. This short paper clears up this issue, demonstrating a real-world application the audience can take away and start using immediately.

Introduction In designing a SAS DATA Step to feed a report or de-normalized table with data sorted by a common key field and arranged by subordinate values, the initial thought would be to sort or index the data and move to the next problem. However, sorting will not work when the one of the identifying keys is changed due to either a revaluation, a status change or inactivation of the record in the table. When hunting and pecking through an interactive screen is the only alternative to finding associated records, there is a better solution, which is the use of the RETAIN Statement and a good key sort to arrange all present and historic records under a single identifying key held in memory while the DATA step iterates through all associated records. The Problem At issue for our session is the need to arrange all records by a uniform control number: INTAKE_NO ORIG_CONTROL_NO 6471905 9999661 6471907 9999661 6471908 9999661 8365980 9999661 8365980 9999999 (bucket value indicating a defunct account number) 8365981 9999661 This particular problem has its real-world application at a bank in the credit card division, where defunct account numbers are given a control number with a special bucket value to indicate loss or fraud activity. Hint: when you lose a credit card, the number gets retired, not recycled. Normally, the only way to find associated accounts would be to follow the Intake Number in a separate interactive online record system, take a copy or screen scrape, and paste it to a report of the portfolio to follow the activity of all lost and active account numbers. If your portfolio is in the thousands, or millions, this is a daunting and impossible task. Remember, you can base the sort on the Intake Number, but if you have sorted on the Control Number, all lost account numbers will sink to the bottom. 2

The Process Base SAS has a list of alternatives to hierarchically arrange data, such as: PROC REPORT PROC TABULATE INTERLEAVING MERGING SORTING While one could do a sort on INTAKE_NO and CONTROL_NO, remember that Intake Numbers are not uniformly sorted and unique per account, so that by itself will not pass as a solution. Merging or Interleaving (MERGE or SET) will assist, but cannot complete the job without a surrogate key, and this would be a time consuming process, causing a full stop at CONTROL_NO. PROC TABULATE or PROC REPORT could arrange the records in a pure hierarchy, but without a continuous surrogate key initiated when the portfolio was created, the lost account bucket values will always sink to the bottom. What is needed is the ability to keep the original Control Number persisting in memory until the next Intake Number is read. We would then need to add another column to indicate the true, original Control Number. The Solution Every SAS DATA Step returns to the top an observation is complete. Additional logical looping may take place to determine a logical branch or situation to modify the data being read. Knowing this, we can construct our program: a. For purposes of bookkeeping and QA, it is recommended that non-loss Account Numbers and Loss Account Numbers be stored in separate datasets. NOTE: Sort or index all upstream datasets coming into the DATA Step by INTAKE_NO. data bank.pop._lookup_table; set bank.pop._lookup_table1 bank.pop._lookup_table2; b. Set the RETAIN Statement anywhere within the DATA Step. This author s preference is to place the RETAIN after the SET Statement. retain x; c. Persist the SORT in the DATA Step to ensure the loop will maintain its sort order processing. by INTAKE_NO ; if first.intake_no then do; x = ORIG_CONTROL_NO; 3

d. This is the current value to persist and reside next to the Control Number to allow an auditor comparative information through the active and lost account history. end; e. We then create a new, non-loss Control Number. We then drop the x value The Result CONTROL_NO = x; drop x; run; INTAKE_NO CONTROL_NO ORIG_CONTROL_NO 6471905 9999661 9999661 6471907 9999661 9999661 6471908 9999661 9999661 8365980 9999661 9999661 8365980 9999661 9999999 (loss bucket number) 8365981 9999661 9999661 CONCLUSION Things to remember: Sort by a key you will not want to persist; Use the RETAIN statement anywhere in the DATA Step. Store the persisting value within the DO Loop, and assign a comparative field after breaking out of the loop for storage in memory when the DATA Step loops back to input the next observation. Use the above code in your tool kit of tricks. REFERENCES SAS Institute, Inc. 2011. SAS OnlineDoc 9.3 Cary, NC. 4

CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Name: Alan Mann Enterprise: Independent Consultant Address: 370 Sylvan Lane City, State ZIP: Harpers Ferry, WV 25425 Work Phone: (540) 336-7873 Fax: E-mail: amann1@earthlink.net Web: SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. 5