Arrays: The How and the Why of it All Jane Stroupe

Similar documents
Section 1.2: What is a Function? y = 4x

Excel Functions & Tables

3. EXCEL FORMULAS & TABLES

Scientific Programming in C X. More features & Fortran interface

Lecture 10: Boolean Expressions

This report is based on sampled data. Jun 1 Jul 6 Aug 10 Sep 14 Oct 19 Nov 23 Dec 28 Feb 1 Mar 8 Apr 12 May 17 Ju

Contents. Generating data with DO loops Processing variables with arrays

All King County Summary Report

Monthly SEO Report. Example Client 16 November 2012 Scott Lawson. Date. Prepared by

Seattle (NWMLS Areas: 140, 380, 385, 390, 700, 701, 705, 710) Summary

Seattle (NWMLS Areas: 140, 380, 385, 390, 700, 701, 705, 710) Summary

software.sci.utah.edu (Select Visitors)

Seattle (NWMLS Areas: 140, 380, 385, 390, 700, 701, 705, 710) Summary

Statistical Methods in Trending. Ron Spivey RETIRED Associate Director Global Complaints Tending Alcon Laboratories

ICT PROFESSIONAL MICROSOFT OFFICE SCHEDULE MIDRAND

SAS Scalable Performance Data Server 4.3

AIMMS Function Reference - Date Time Related Identifiers

ARRL RADIOGRAM A How To

Conditional Formatting

CS Programming I: Arrays

CSE 341 Section Handout #6 Cheat Sheet

2

What you learned so far. Loops & Arrays efficiency for statements while statements. Assignment Plan. Required Reading. Objective 2/3/2018

NCC Cable System Order

Omega Engineering Software Archive - FTP Site Statistics. Top 20 Directories Sorted by Disk Space

Asks for clarification of whether a GOP must communicate to a TOP that a generator is in manual mode (no AVR) during start up or shut down.

CMR India Monthly Mobile Handset Market Report. June 2017

2

2

New Listings. Bullock Russell Market Area. 300 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

University of Osnabruck - FTP Site Statistics. Top 20 Directories Sorted by Disk Space

HPE Security Data Security. HPE SecureData. Product Lifecycle Status. End of Support Dates. Date: April 20, 2017 Version:

Notes Lesson 3 4. Positive. Coordinate. lines in the plane can be written in standard form. Horizontal

ADVANCED SPREADSHEET APPLICATIONS (07)

Presentation title goes here

Cody s Collection of Popular SAS Programming Tasks and How to Tackle Them

CS113: Lecture 3. Topics: Variables. Data types. Arithmetic and Bitwise Operators. Order of Evaluation

Instructor training course schedule v3 Confirmed courses due completion by 31 st July 2019

Pennington County Government Justice Center. Schematic Design - October 27, 2015

Monthly Statistical Digest

SAS System Powers Web Measurement Solution at U S WEST

Fast Lookup: Hash tables

Choosing the Right Technique to Merge Large Data Sets Efficiently Qingfeng Liang, Community Care Behavioral Health Organization, Pittsburgh, PA

High Performance Computing

15. Processing variables with arrays. GIORGIO RUSSOLILLO - Cours de prépara)on à la cer)fica)on SAS «Base Programming» 343

If Case Loop Next Exit

FREQUENTLY ASKED QUESTIONS

INFORMATION TECHNOLOGY SPREADSHEETS. Part 1

Freedom of Information Act 2000 reference number RFI

Industrial Machinery. Search Marketing Case Study

UAE PUBLIC TRAINING CALENDAR

Today s Vision Tomorrow s Reality. Brian Beaulieu CEO

Fluidity Trader Historical Data for Ensign Software Playback

1.0 PXL-500 Family Controllers

SCI - software.sci.utah.edu (Select Visitors)

Instructor : Dr. Sunnie Chung. Independent Study Spring Pentaho. 1 P a g e

Essentials. Week by. Week. All About Data. Algebra Alley

Getting Information from a Table

Major Consolidated Subsidiaries

Vlad Kolesnikov Bell Labs

DATE OF BIRTH SORTING (DBSORT)

App Economy Market analysis for Economic Development

CMIS 102 Hands-On Lab

Create Designs. How do you draw a design on a coordinate grid?

Nigerian Telecommunications (Services) Sector Report Q2 2016

Data Types. 9. Types. a collection of values and the definition of one or more operations that can be performed on those values

How to Wrangle Data. using R with tidyr and dplyr. Ken Butler. March 30, / 44

University of California - FTP Site Statistics. Top 20 Directories Sorted by Disk Space

PROFESSIONAL CERTIFICATES AND SHORT COURSES: MICROSOFT OFFICE. PCS.uah.edu/PDSolutions

Temperature Patterns: Functions and Line Graphs

Mpoli Archive - FTP Site Statistics. Top 20 Directories Sorted by Disk Space

Penetrating the Matrix Justin Z. Smith, William Gui Zupko II, U.S. Census Bureau, Suitland, MD

More Binary Search Trees AVL Trees. CS300 Data Structures (Fall 2013)

Programming for Engineers Structures, Unions

Annex A to the MPEG Audio Patent License Agreement Essential Philips, France Telecom and IRT Patents relevant to DVD-Video Disc - MPEG Audio - general

APPENDIX E2 ADMINISTRATIVE DATA RECORD #2

GWDG Software Archive - FTP Site Statistics. Top 20 Directories Sorted by Disk Space

COURSE LISTING. Courses Listed. Training for Database & Technology with Modeling in SAP HANA. 20 November 2017 (12:10 GMT) Beginner.

More BSTs & AVL Trees bstdelete

CIMA Asia. Interactive Timetable Live Online

SHADOW - Main Result. windpro EFFECTEN SLAGSCHADUW HERENTALS GEPLANDE WINDTURBINES HERENTALS. EDF Luminus Markiesstraat Brussel

Unit 2-2: Writing and Graphing Quadratics NOTE PACKET. 12. I can use the discriminant to determine the number and type of solutions/zeros.

Quatius Corporation - FTP Site Statistics. Top 20 Directories Sorted by Disk Space

University of Valencia - FTP Site Statistics. Top 20 Directories Sorted by Disk Space

Getting in Gear with the Service Catalog

Eindhoven University of Technology - FTP Site Statistics. Top 20 Directories Sorted by Disk Space

Annex A to the DVD-R Disc and DVD-RW Disc Patent License Agreement Essential Sony Patents relevant to DVD-RW Disc

June 2012 First Data PCI RAPID COMPLY SM Solution

Polycom Advantage Service Endpoint Utilization Report

AVM Networks - FTP Site Statistics. Top 20 Directories Sorted by Disk Space

Contents:

ECSE 321 Assignment 2

How to declare an array in C?

Google Data Studio. Toronto, Ontario May 31, 2017

How It All Stacks Up - or - Bar Charts with Plotly. ISC1057 Janet Peterson and John Burkardt Computational Thinking Fall Semester 2016

Technical Manual Index

Polycom Advantage Service Endpoint Utilization Report

Houston Economic Overview Presented by Patrick Jankowski, SVP Research Greater Houston Partnership

Intellicus Enterprise Reporting and BI Platform

For the mobile people of. Oulu

Transcription:

Arrays: The How and the Why of it All Jane Stroupe When you need to join data sets, do you even ask yourself Why don t I just use an array? Perhaps because arrays are a mystery or perhaps because you have never looked into how efficient arrays can be, you have not put them into your lookup toolbox This presentation addresses both problems, How to utilize an array and why you should try it In addition, you will discover how to use a multidimensional array to solve a tricky data combination problem Using One Dimensional Arrays Here s the problem You have a SAS data set that looks like this: ficticious_watch_orders Date ID Price 01NOV2012 10003 1150 01NOV2012 12022 3150 01NOV2012 10004 1565 02NOV2012 11011 1655 02NOV2012 11012 2325 03NOV2012 12021 2590 03NOV2012 12022 3150 03NOV2012 12023 4300 03NOV2012 12024 3250 In this data set, the variable ID represents a character variable where the first two digits indicate the manufacturer of watches and the last two digits represents the style number You need to combine the data with the following two SAS data sets In these two SAS data sets, the variable ID is numeric ficticious_brands ID Name 10 Tagline 11 Roladex 12 Brett ficticious_styles ID Name 1 Man About Town 2 Big Gold Face 3 Moons Abound 4 Aqua Proof 11 Prince Someone 12 Princess 13 Not Your Dads Watch 21 One Time Zone 22 Two Time Zones 23 Three Time Zones 24 Four Time Zones 25 Five Time Zones

There are several ways that you could proceed, but since the ID variable in the ficticious_watch_orders SAS data set is character and contains both the manufacturer id and the style id, you are going to have to process ficticious_watch_orders to split the ID variable You could have combined the ficticious_brands SAS data and the ficticious_styles SAS data to create a variable which represents the combination of brand ID and style ID Regardless of how you decide to create the appropriate ID, you still will have to sort the ficticious_watch_orders SAS data in order to merge the three data sets together So let s try to do this all in one DATA step and avoid a sort Example 1 First, a simple example will demonstrate how an array could be used to combine the ficticious_brands SAS data with the ficticious_watch_orders SAS data data watch_sales; set ficticious_watch_orders; array man{10:12} $20 _temporary_ ('Tagline','Roladex','Brett'); M_ID=input(substr(ID,1,2),2); Manufacturer=man{M_ID}; Create a temporary array that stores the character constants Tagline, Roladex, and Brett in a length of 20 The array name is man and the index values range from 10 to 12 Use the SUBSTR function to extract the first two characters of the ID variable and the INPUT function to convert the value to numeric Array indexes must be numeric The variable Manufacturer is found by looking for the element of the array man that corresponds to the variable M_ID This example required that you type the values for the brand into the array Since you have the data in a SAS data set already, you can use IF/THEN logic and DO loops to load the array with the values of the brand Example 2 data watch_sales; set ficticious_watch_orders; array man{10:12} $20 _temporary_; if _N_=1 then do i=1 to NumObs; set ficticious_brands(rename=(id=b_id)) nobs=numobs; man{b_id}=name; M_ID=input(substr(ID,1,2),2); Manufacturer=man{M_ID}; Create a temporary array that stores the character constants in a length of 20 The array name is MAN and the index values range from 10 to 12 But no initial values are specified

The first time through the DATA step the DO loop is going to read all of the data from the ficticious_brands SAS data set The NumObs variable is set during compile time when the NOBS= option on the SET statement names the variable It is important to specify if _n_=1 to execute the DO loop once Executing the DO loop a second time would stop the DATA step because the SET statement would encounter the end of the SAS data set ficticious_brands, resulting in one observation in the watch_sales SAS data set In addition to creating the value of the variable NumObs during compile time, the RENAME= option changes the name of the ID variable in the ficticious_brands SAS data set so it is not confused with the ID variable in the ficticious_watch_orders SAS data set Load each element in the man array with the value of Name Now continue the program to include the style information from the ficticious_styles SAS data set Example 3 data watch_sales; set ficticious_watch_orders; array man{10:12} $20 _temporary_; array style{25} $35 _temporary_; if _N_=1 then do i=1 to NumObs; set ficticious_brands(rename=(id=b_id)) nobs=numobs; man{b_id}=name; if _N_=1 then do i=1 to Num; set ficticious_styles(rename=(id=s_id)) nobs=num; style{s_id}=name; M_ID=input(substr(ID,1,2),2); St_ID=input(substr(ID,length(ID)-1),2); Manufacturer=man{M_ID}; Style_Name=style{St_ID}; Just like the previous ARRAY statement, this ARRAY statement creates an array named style to hold the constant names of the styles Even though there are not 25 styles in the SAS data set ficticious_styles, the array refers to 25 values The index variable for an array must be consecutive integers Reads from the SAS data set ficticious_styles and names the NOBS= variable NUM (note, it is different from the previous NOBS= variable) After that, the program is almost identical to the previous one

Example 4 The SAS data set sales contains the sales for five styles of watches: Sales Date Style1 Style2 Style3 Style4 Style5 31JAN2012 129800 149800 171336 152024 1042 29FEB2012 282400 31400 372768 354912 4089 31MAR2012 68700 75570 90840 86156 9387 30APR2012 127200 13200 167900 151136 1333 31MAY2012 544000 584000 718800 62720 7754 30JUN2012 96000 105300 127160 117044 1728 31JUL2012 37000 38700 45840 41226 4943 31AUG2012 43980 43780 58536 52248 6267 30SEP2012 77600 85600 102430 92178 1166 31OCT2012 151400 166400 199980 179972 2928 30NOV2012 39000 42000 51480 46330 5554 31DEC2012 90000 99400 11980 10739 8742 The SAS data set target contains the target figures, in millions of dollars, for each month in the years 2009 to 2012 Target Year M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 2009 2 3 2 4 3 2 1 1 1 2 2 1 2010 3 5 3 6 5 4 3 3 3 3 4 3 2011 3 6 4 7 2 5 2 4 4 4 5 4 2012 5 5 3 6 4 6 4 2 5 6 6 5 To calculate the difference between the total sales amount for each date and the target for that month in 2012, you use PROC TRANSPOSE, and several DATA steps proc transpose data=target(where=(year=2012)) out=target_t(rename=(col1=target)) name=mon; data target_t_mon; set target_t; where Target ne 2012; Month=input(substr(Mon,2),2); data sales_sum(sortedby=month); set sales; Month=month(Date); Total=sum(of Style:); data compare_; keep Date Style1-Style5 Total Target Difference; merge sales_sum target_t_mon; by Month; Difference=Total-(Target*1000000);

Instead, you could have combined the two data sets in one DATA step, using an array data compare; keep Date Style1-Style5 Total Target Difference; array mon{*} Month1-Month12; if _N_=1 then set target (where=(year=2012)); set sales; Total=sum(of Style:); Month=month(Date); Target=mon{Month}*1000000; Difference=Total-Target; Using a Multidimensional Array to combine data Suppose you have a data set that contains high and low temperatures, along with the wind speed: temps Date High Low Wind 29AUG2012 8 0 14 30AUG2012 7-2 22 31AUG2012 9-1 20 01SEP2012 8 0 23 02SEP2012 9 1 19 03SEP2012 10 2 14 04SEP2012 12 3 16 05SEP2012 10 2 23 06SEP2012 12 0 12 In addition, you have a table that provides the temperatures based on the wind chill factors; wchill WSpeed Neg10 Neg5 Tmp0 Tmp5 Tmp10 Tmp15 Tmp20 Tmp25 Tmp30 5-22 -16-11 -5 1 7 13 19 25 10-28 -22-16 -10-4 3 9 15 21 15-32 -26-19 -13-7 0 6 13 19 20-35 -29-22 -15-9 -2 4 11 17 25-37 -31-24 -17-11 -4 3 9 16 30-39 -33-26 -19-12 -5 1 8 15 35-41 -34-27 -21-14 -7 0 7 14 40-43 -36-29 -22-15 -8-1 6 13 Note that the wind speed is rounded to the nearest five, as is the temperature that ranges from -10 to 30 To combine these two data sets, we will load the wchill data into a two dimensional array and be able to access the appropriate row and column for the high temperature and low temperature based on the wind speed based on the data in the temps SAS data set

Here is the program: data wndchll(keep=date Wind High Low HighChill LowChill); array WC{8,9} _Temporary_; array farenheit{*} High Low HighChill LowChill; if _n_=1 then do I=1 to 8; set wchill; array Tmp{9} Neg10 -- Tmp30; do J=1 to 9; WC{I,J}= Tmp{J}; set temps; Row=round(Wind,5)/5; Column1=(round(High,5)/5)+3; Column2=(round(Low,5)/5)+3; HighChill=round(WC{Row,Column1}); LowChill=round(WC{Row,Column2}); do i=1 to dim(farenheit); Farenheit{i}=9/5*Farenheit{i}+32; Creates a temporary array named W that refers to 8 rows and 9 columns The rows and columns correspond to the observations and variables in the wchill SAS data set Creates an array to refer to the High and Low temperatures in the temps SAS data set and create the new variables HighChill and LowChill temperatures The first time through the DATA step, the DO loop causes the SET statement to execute 8 times, thus reading through the wchill SAS data set The Tmp array refers to the nine temperature variables in the wchill SAS data set The DO loop is going to fill the array W with all of the values in the data set wchill The three assignment statements round off the temperatures and wind speed that are being read from the temp SAS data set Using the W array, retrieve the value that is associated with the low temperature and wind speed, high temperature and wind speed The final DO loop is going to convert the variables containing the temperatures from Celcius to Fahrenheit Conclusion There are many SAS techniques that can be used to combine data horizontally If you have an appropriate index variable (remember, it must be a consecutive integer), an array could be the most efficient technique And if you do not have a consecutive integer, try a DATA step hash object

Contact Information Jane Stroupe SAS Contract Instructor jgsstroupe@gmailcom