An Animated Guide : Speed Merges: resource use by common procedures Russell Lavery, Contractor, Ardmore, PA

Size: px
Start display at page:

Download "An Animated Guide : Speed Merges: resource use by common procedures Russell Lavery, Contractor, Ardmore, PA"

Transcription

1 An Animated Guide : Speed Merges: resource use by common procedures Russell Lavery, Contractor, Ardmore, PA ABSTRACT This paper is a comparison of how resources are used by different SAS table lookup (Figure 1) techniques. The paper will be useful for programmers starting to study programming efficiency or who are experiencing efficiency problems with current programs. Understanding resource use of basic processes gives programmers a logical structure for selecting the table lookup technique that will most likely solve their efficiency problem. Efficiency means using resources gently. Working for efficiency has many dimensions and involves trade-offs. Takes too long Maintenance Programming effort EFFICIENCY TRADEOFFS IO Utilization Advanced table lookup techniques all have the characteristic that they perform a Sorted By Merge (or close to one) without sorting. All of the advanced table lookup techniques (Format, IORC, hashing, SQL, tagsort, modify, etc.) make different demands on I/O, CPU and disk space. Understanding the demands of the different table lookup techniques lets a programmer select the best technique in his/her particular situation. Figure 2 CPU Memory Utilization Initial Programming effort space File is too big Several years of NESUG/SUGI proceedings papers are online. These papers can be downloaded/read for an in-depth discussion of each of the table lookup techniques mentioned here. This paper will help readers decide which of the table lookup techniques will best help them and offer a searchable database of the articles available online. Concerned with FAST table lookup Table lookup is Our Old Friend the If Statement If zip= then state= PA ; Else if zip= then state= NJ ; Else state=?? ; Small File Key var1 var2 etc. Figure 1 Table Lookup is often done by accessing another file (by merge, format, IORC, Hashing etc.) Table Lookup is also subsetting DATA STEP Often a By Merge Some Logical result Large File Key var1 var2 etc. The processes discussed are: Basic sequential SAS read of a data set Sorting 1) SAS sort 2) Host sort 3) Asserted Sort 4) Ordered by The binary search process Formats Indexes 1) IORC_ 2) tagsort Key Indexing, Bitmapping and Hashing Finally, the paper will conclude with a graphic that ranks the different techniques and offer an access dataset that can be used to identify useful online articles.. THE BASIC SAS READ When SAS executes a set/merge command on a file it reads the file from top to bottom. This top to bottom read is a fast access technique for SAS. A top to bottom read is part of the data step processing and is how many procs access data. Total run time is calculated by multiplying the time for a read by the number of observations. SAS performs a fast operation (a sequential read) but performs one for every observation. Total time can be too high. INTRODUCTION A table lookup is the use of a key, or id, variable in a small file to lookup values in a large file. This can often take the form of taking a subset of the large file. The goal of this paper is to support good decision making when selecting a table lookup technique. The paper will discuss the details of common SAS processes and focus on the issues of CPU usage and disk space requirements. Efficiency is a multidimensional concept and usually involves making trade-offs (Figure 2). Programs that use little RAM often have excessive IO. Programs can run quickly often have high CPU usage/requirements. Knowing how processes use resources help prdogrammers make those trade-offs. SAS sequential read Some File PROCESSING TIME Data new; set old; tax=sales*.06; Reading a SAS file sequentially (first obs to last) is a very fast process (for SAS) SAS performs a read for each observation- but each individual read is fast Generally people study efficiency because a job takes too long to run/write or creates a file that is too big for their system. Efficiency is using the most constrained resource gently. Figure 3 SORTING AND THE SORTED BY MERGE

2 Sorting is a common SAS technique and a required step for the Sorted By Merge table lookup. A Sorted By Merge is easy to program but is very CPU and disk intensive. (Figure 4) When SAS sorts a file (say work.small) it creates a temp copy of the file AND a sorting copy that is about 3 ½ times the size of the original file. Sorting of obs. is done in the temp file and, if successful, the sorted obs are written back to work.small. After overwriting work.small, the temp file is automatically deleted and disk/memory space is freed up. It is the large size of the temp file (3 ½ times the size of the original file) that sometimes causes sorting to fill up a disk and to fail. The memsize=max and noequals should be used as sort options whenever possible. If datasets are small (several thousand obs), the Sorted By Merge is recommended because it is so easy to program and the hardware demands (CPU and disk) are easily met by modern equipment. With small files, the preliminary sorting and the top to bottom read in the data step takes so little total run time that there is little incentive to explore advanced table lookup techniques. Figure 4 SAS Sort Big File Sorting File Copy of Sorting creates a sorting file that is approximately 2 1/2 times the size of the source file. Proc sort data=small noequals sortsize=max; by state zip; Proc sort data=big; noequals sortsize=max; by state zip; With large files, programmers can experience time/disk space problems. In this case Sorted By Merges should be avoided. Sorting is a resource intensive operation, taking lots of time and lots of disk space. Sorting large files has crashed systems. Resource demand for Sorted By Merge: CPU: HIGH DISKSPACE: HIGH IO: HIGH THE SAS ASSERTED SORT The fastest way to sort data is to get the data delivered in sorted order and simply tell SAS to treat the data as sorted. This is the Asserted Sort. It takes zero time and is shown in Figure 5. Clients often struggle to sort data that is already sorted, or that can be requested already sorted from suppliers. The sortedby=variable option can be applied to the data or set statement. The box in the lower left of Figure 5 shows what Proc Contents would report about data set one in Figure 5. SAS says the data is sorted but reports that SAS has not sorted the data. If the data is sorted by a Proc Sort, the Validated characteristic would be YES. SAS will create, and use, the first and last. variables on data that is asserted to be sorted. Figure 5 SAS Asserted Sort If you know your data is sorted, you can assert that it is sorted and SAS will believe you. You can assert when you create or use. Proc Contents shows: Sortedby: zip Validated: NO data one /*(sortedby=zip)*/; infile datalines; input zip $char5.; datalines; Great time saver!! No work done!! ; data two; set one (sortedby=zip); by zip; if first.zip and last.zip; proc print data=two; Resource demand for Asserted Sort: CPU: NONE DISK: NONE IO: NONE HOST SORT VS SAS SORT The word on the street is that the SAS Proc Sort is very good for small/moderate data sets but is a bit slow if the data set is large. Unix, and some mainframes have sorting routines that can be faster that SAS Proc Sort. SAS can be instructed to use a custom/host operating system sorting routine rather than its own. A programmer can instruct SAS to use the SAS sort, a host sort, or to use the best sort method (host sort if file is above a certain size). One danger associated with using a host system is that it can use a different sorting sequence from SAS and that files are sorted one way if they are smaller than the cutpoint and a different way if they are larger than the cutpoint (sorted by the host sort). Figure 6 Select SAS vs. Host Sort Sometimes the operating system has a very fast sort. You can specify to use: the SAS sort, or the Host OS sort or best sort Conventional wisdom says SAS sort is good for small files (Test and set your the cutpoint) Options sortcp= xx ; Options sortpgm=sas; Options sortpgm=host; Options sortpgm=best; Watch out!!! There can be differences between SAS and Host sort logic SAS for MS Windows, before version 9, can not support a host sort. It accepts the commands without issuing notes/warnings but does not do anything. THE BINARY SEARCH PROCESS The binary search process underlies several SAS procedures. It has been shown, mathematically, to be the optimal search method under conditions common to many SAS programming tasks. Binary searches are part of formats and indexes. Binary searches search an ordered file by repeatedly dividing it in half. Figure 7 shows the binary search process applied to a format. The format records are ordered in the catalog and SAS finds a format by 2 OF 6

3 repeatedly applying simple logic/rules. In Figure 7 we check if subject 10 was assigned to the test or control treatment. SAS picks the middle observation in the file (9) and asks if that is the observation it was seeking. If it is the desired observation, the search stops. If it is not, SAS asks if the desired number is above, or below, the current number. It is below. Figure 7 Formats use binary searches Format File 001 test 002 control 003 test 004 control 005 control 006 control 007 test 008 control 009 test 010 control 011 test 012 test 013 control 014 test 015 control 016 test 017 control Success Searching a file by Halves TorC= PUT(PAT_ID,InSml.); PROCESSING TIME SAS uses binary searches to look within formats and indexes. Lets look for subject 10 See if the subject number is in the exact middle of the file If it is not, is it above or below the middle of the file - define range See if the subject number is in the exact middle of the range defined above Repeat SAS then divides the file in half and only considers from 9 to 17. It then picks a number in the middle of the new range (14) and asks if that is the desired number. If it is, the process stops. It is not. SAS asks if the desired number is above, or below, the current number. It is above. SAS then divides the range in half and only considers from 9 to 14. It picks a number in the middle of the new range (11). SAS asks if that is the desired number. If it is, the process stops. It is not. SAS asks if the desired number is above, or below, the current number. It is above. SAS then divides the range in half and only considers from 9 to 10. It picks a number in the middle of the new range (10). SAS asks if that is the desired number. Since it is, the process stops. Figure 7 illustrated a binary search using a format file and shows information being transferred from the format to the PDV via a put statement. FORMATS SAS formats automatically convert data values to the formatted value when data is displayed. Formats can convert one character to another, a character to a number, a number to a character or a number to a different number. Formats are created with a Proc Format. They take time to create and disk space after they are created. They can be re-used, so the cost of creation needs to be paid only once. There is no automatic maintenance performed on formats. Formats are very useful SAS tools to save time and save disk space. SYNTAX proc format; value skill LOW -< 1="BAD # LOW" 1="SAS" 2="Java" 3,5,6="Microsoft" Creates THREE 6<-HIGH="BAD # HIGH"; formats value $ Gen_Age "M"="A.M." "F"="A.F." "C","B","A" ="Non-Adult" other ="error"; VALUE $ WHR_IN /*CHAR RANGES */ low-<"00000"="bad zip" "19000"-"19099"="PHILA" "19100"-"19400"="pa" "80000"-"89999"="JERSEY" "OTHER"="UNKNOWN"; PROC PRINT DATA=EX_2; FORMAT GENDER $Gen_Age. JOB SKILL. ; RUN; Figure 8 Data Set Ex_2 NAME GENDER JOB Bob M 1 Russ M 2 Sue F 2 AJ M 1 Dot F 2 Prints using the format OUTPUT NAME GENDER JOB Bob A.M. SAS Russ A.M. Java Sue A.F. Java AJ A.M. SAS Dot A.F. Java Figure 8 shows the normal creation, and typical use, of character and numeric formats. The large box on the left contains the syntax for a Proc Format and a Proc Print that applies those formats. A data set is in the upper right hand corner and the output of the Proc Print, applying the formats to that data set, is in the lower right. A large portion of the speed of the format table lookup comes from the fact that that it is usually a memory resident technique and avoids disk access. Theoretically there is no limit on the number of levels in a SAS format, however when the format table lookup executes the whole format must fit in ram memory or suffer. Some OS can not page formats and will crash if the format is larger than the available RAM. Some OS will page large formats between disk and RAM, allowing the job to complete, but increasing run time. SAS code that would select, from a very large file, patients that had been assigned to be controls (imagine the file in Figure 7 is the format in_sml) is shown below. This is a Format Table Lookup Data subset; Set very_big_file; If put(pat_id,insml.)= control ; Run; Formats are stored in a catalog (permanent or work) and take Ram/disk space (Figure 9) When formats are created from an input file (see Figure 9) the data from the input file is summarized in a file (often called cntlin). Cntlin is used as input to a Proc Format. This cntlin file must be manually removed from work to release space. DATA N_SML_TOO; SET BIG; IF PUT(PAT_ID,InSml.)="YES"; RUN; PROC PRINT; RUN; SAS Format Formats require creation of a cntlin file and storage of the format in the catalog. Generally, formats determine how data is displayed. Formats can save disk space because the data in the SAS data set is stored as the original value (in Figure 8 original values are 1,2,M,F) and then displayed in a longer form. In the Figure 8, the savings would be substantial if the zips were stored as zips (19101) and then expanded to much longer values like Main Postal Distribution Center, Philadelphia, Pennsylvania Figure 9 Format Format Cntlin data CNTLIN (keep= fmtname start label type hlo); retain FMTNAME "InSml" TYPE "n" LABEL "YES"; set small rename= (pat_id=start )) end=last; output; if last=1 then do; Hlo="O"; label="other" ; start=.; output; proc format cntlin=cntlin; 3 OF 6

4 While formats are often used for table lookup, they were not designed with this in mind. As a result, formats have perform extra processing, not required for the simple task of table lookup. When the Format Table Lookup fails, a programmer often tries an _IORC_ Merge. An IORC Merge is based on a SAS index and is not a RAM resident technique. It is generally slower than a Format Table Lookup, but often faster than a Sorted By Merge. Resource demand for Format Table Lookup: CPU: HIGH DISKSPACE:Low IO: Moderate INDEXES THE BASIS OF SEVERAL MERGE TECHNIQUES Indexes can be created by several procs. They take time to create and disk space after they are created. They can be reused, so the cost of creation needs to be paid only once. Figure 10 SAS Index Small File INDEX Big File INDEX Copy of Big File Indexes are from 10% to 50% of the size of the source file. Indexes take time to create! Individual reads are slow! Indexes are good: 1) if they can be reused and 2) if you only want 5% to 10% of the big file. data small(index=(state)); set small; Proc sql; create index state on work.small(state); quit; Proc datasets lib=work; Modify Large; index create zip; quit; Reminder Indexes use binary searches PROCESSING TIME INDEX lookup involves more operations than format lookup Index File subj page row Lets look for subject Once 9 is found, accessing the information is a multi-step process Read head moves to Controller find 5542 Spinning drive Obs 9 Obs 10 1 byte /obs Unused Descriptor OH Obs Index returns the Page/Block location of a of data page & row read back of data on Data is Parsed to find to CPU the disk proper observation drive and field Figure 11 Generally, indexes take more time to recover data than formats because the use of an indexed involves more steps than the use of a format - and some of the steps are slow As a first step in data access, an index goes through the same binary search as a format. However, a successful find does not return desired information (see Figure 7) but rather a location on the hard drive where the information can be found (see Figure 11). Reading the information involves a slow (mechanical) disk read to find a page of data that contains several observations and then CPU cycles to select the correct observation and variable. Any mechanical process is slow and to be avoided. Indexes are attractive if they return a small fraction of a file. If an index returns 20% of the large file, there are often faster techniques. If an index returns 75% of a file, the process will be slow. THE _IORC_ MERGE The IORC merge requires the use of an index and is illustrated in Figure 12. The first set executes and loads the variables from that dataset (Day_1) into the PDV. The second set (with the key= option) uses the value of key from the PDV to do an indexed read of the data set Up_Dt. In Figure 12 the lookup is successful and variables from UpDt are copied to the PDV. A successful index lookup results in a value of 0 being written to the variable IORC on the PDV. Bob Y Russ N Sue N AJ N Fred Y Glenn N KL Y data new3; SYNTAX set Day_1; set UpDt key=name/unique; array setmiss(*) $ ShinSpl - -T_Elb; if _iorc_ then do; _error_=0; _IORC_=0; do i= 1 to dim(setmiss); setmiss(i)=""; Figure 12 DATA SET: Day_1 Bob S N Eric S N Sue N N Fred S Mark S Walt N KL N T Wayne N T Sally N T OUTPUT FILE Bob Y S N DATA SET: UpDT DATA VECTOR Name Run Sh_Sp T_Elb _N ERROR IORC_ Bob Y S N An unsuccessful attempt to do a table lookup is shown in Figure 13 When the second set statement fails to find a match in the index lookup, it writes a non-zero value to IORC. The nonzero value of IORC causes the do group to execute and reset variables in the PDV. Bob Y Russ N Sue N AJ N Fred Y Glenn N KL Y data new3; SYNTAX set Day_1; set UpDt key=name/unique; array setmiss(*) $ ShinSpl - -T_Elb; if _iorc_ then do; _error_=0; _IORC_=0; do i= 1 to dim(setmiss); setmiss(i)=""; Figure 13 DATA SET: Day_1 Bob S N Eric S N Sue N N Fred S Mark S Walt N KL N T Wayne N T Sally N T OUTPUT FILE Bob Y S N Russ N DATA SET: UpDT DATA VECTOR Name Run Sh_Sp T_Elb _N ERROR IORC_ Russ N Every observation in the data set mentioned in the first set statement triggers an index lookup. This technique is good if the number of observations in the first file is less than 10% of the number of observations in the second file and poor at 30% or more. Resource demand for IORC Table Lookup: CPU: HIGH DISKSPACE: Low IO: HIGH THE TAGSORT A tagsort (Figure 14) involves the creation of a secondary file 4 OF 6

5 containing the key and a pointer to the location of the observation on the disk (like an index). Tagsorts are easy to program and can be used to support a Sorted By Merge without sorting a file. The taagsort process is similar to an automated IORC merge in that the basic process is an index lookup. It has all the problems associated with index usage if the small file has more than 5% of the number of obs in the large file. Resource demand for Tagsort Table Lookup: CPU: HIGH DISKSPACE: Low IO: HIGH Figure 14 Using a Tagsort involves searching in the sorted secondary file and an indexed lookup using the pointer. SAS Tagsort SORTED Secondary file State zip POINTER Tagsorts create a secondary file containing sort keys and a pointer and a sort file for the secondary file. proc sort data=big tagsort; By state zip; Proc Contents shows: Sortedby: zip Validated: YES KEY INDEXING BITMAPPING AND MANUAL HASHING Key indexing, bitmapping and hashing have been described in a series of articles, written by Dr. Paul Drofmann, These are very fast table lookup techniques because they load the whole small data set into standard SAS arrays - that must exist in RAM. These techniques can be difficult to code and are generally a challenge for the programmer who inherits the hashing code. In key indexing, bitmapping and manual hashing a mathematical function reads the Key information from the PDV and calculates the proper bucket in the araray in one fast step. Resource demand for Keyindexing, etc. Table Lookup: CPU: HIGH DISKSPACE: VERY LOW IO: LOW Small File Key One_var._max No Sorting Small File is AUTOMATICALLY De-duped High Memory Usage FAST-- if.. SAS Array Characteristics (max value in cell) will limit the technique Key indexing, bitmapping & hashing DATA STEP RAM MEMORY Array of Keys Result File Large File Key var1 var2 etc. Can code your own OR SAS V9 will have SAS Coded hashing As we process Large File, quick access to the values in the array, lets us determine if we want the obs from Large File in the result file. has created an easy-to-use hashing applet in V9. The applet can be called from within the data step and creates something like a vary smart array Figure 16 Hashing uses two searches Format File Format: test 001 test 002 control Lets look for subject 7 Hashing: test 004 control 005 control 016 test 006 control test 007 Hashing divides the file into 008 control 009 test buckets 017 control 010 control 011 test And a tree 012 test structure below 013 control the buckets 014 test 010 control 015 control 016 test The hash 017 control function gets you directly to a 018 control 018 test 018 control bucket 020 test 021 control The method 022 test searches down 003 test 023 control the tree 024 test 025 control Find puts 026 test matches on 020 test 027 test the PDV 028 control 029 test If no-match 030 control RC is not zero 023 control 031 test 032 control 001 test 025 control 012 test 026 test 002 control 009 test 005 control 018 test 029 test 030 control 015 control 031 test 032 control 021 control 014 test 027 test 008 control 013 control 004 control 006 control 007 test 028 control 011 test 022 test Hashing is production in V9 and preliminary speed tests show it to be a fast technique. To get maximum speed from, the technique, the whole small file should be memory-resident. On some platforms, SAS may crash if the hash object does not fit in RAM. V9 Hashing has a new file structure and searches it using a two part algorithm. The file structure/algorithm is one reason that hashing is generally faster than formats for table lookup. An additional factor is that the hashing algorithm was designed to do table lookup and, unlike formats, performs few unneeded operations. Resource demand for V9 Hashing Table Lookup: CPU: HIGH DISKSPACE: VERY LOW IO: LOW SQL SQL is a powerful tool for table lookups. The strength of SQL is its ease of use and the time it saves on the programming task. Complex operations can easily be coded in SQL. Generally, SQL has lost out in speed tests against Sorted By Merges. The basic SQL process creates a Cartesian product that can be very large. Much development work has been done to minimize the space requirements of SQL, If space and run time are the problems forcing a programmer to explore efficiency, SQL will generally not be the solution. Resource demand for V9 Hashing Table Lookup: CPU: HIGH DISKSPACE: MED-HIGH IO: MED-HIGH CONCLUSION Figure 17 presents a rough guide to the use of these techniques. It is hoped that the information provided here will point a reader towards the technique that best solves her/his efficiency problem. For cheap and quick access to additional information on all of these topics request a copy of the searchable database of online proceedings papers from the author. The database can be used to identify articles on the subject of interest that can then be downloaded from SUGI and NESUG web sites. Figure 15 V9 HASHING In response to the challenge of coding hashing algorithms, SAS 5 OF 6

6 Figure 17 Our List of tools for table Lookup By Merges & SQL Joins Tag Sorted by merge Indexed by merge SQL Joins SAS Sorted by merge Host System Sorted by merge Asserted to be sorted by merge Update _IORC_ MERGE as table lookup Formats as table lookup SAS V9 Hashing Custom Coded Hashing Custom Coded Bitmapping Custom Coded Key Indexing TOOLS FOR TABLE LOOKUP fast Intensive Memory Intensive available from the author. It can be searched for articles on formats, IORC, etc. that can then be downloaded from the web. for a copy. For general understanding of how things work in SAS: Aster & Seidelman Professional Programming Secrets McGraw Hill Virgile, Efficiency: Improving the Performance of your SAS Applications SAS Institute SAS course notes 58032: Optimizing s CONTACT INFORMATION comments and questions are valued and encouraged. Contact the author at: Russell Lavery, Independent Contractor 9 Station Ave. Apt 1 Ardmore, PA # 3 russell.lavery@verizon.net SAS is a registered trademark of SAS Institute, Inc., in the USA and other countries. indicates US registration. REFERENCES Several years of proceedings are available online for free. A MS Access database listing online SUGI/NESUG articles is 6 OF 6

7 7

Administration & Support

Administration & Support An Animated Guide : Speed Merges: Resource use by common non-parallel procedures Russ Lavery Contractor for ASG, Inc. ABSTRACT This paper is a comparison of how resources are used by different SAS table

More information

Paper TT17 An Animated Guide : Speed Merges with Key Merging and the _IORC_ Variable Russ Lavery Contractor for Numeric resources, Inc.

Paper TT17 An Animated Guide : Speed Merges with Key Merging and the _IORC_ Variable Russ Lavery Contractor for Numeric resources, Inc. Paper TT7 An Animated Guide : Speed Merges with Key Merging and the _IORC_ Variable Russ Lavery Contractor for Numeric resources, Inc. ABSTRACT The key mege (A.K.A. _IORC_ merge) is an efficiency technique.

More information

1. Join with PROC SQL a left join that will retain target records having no lookup match. 2. Data Step Merge of the target and lookup files.

1. Join with PROC SQL a left join that will retain target records having no lookup match. 2. Data Step Merge of the target and lookup files. Abstract PaperA03-2007 Table Lookups...You Want Performance? Rob Rohrbough, Rohrbough Systems Design, Inc. Presented to the Midwest SAS Users Group Monday, October 29, 2007 Paper Number A3 Over the years

More information

Merge Processing and Alternate Table Lookup Techniques Prepared by

Merge Processing and Alternate Table Lookup Techniques Prepared by Merge Processing and Alternate Table Lookup Techniques Prepared by The syntax for data step merging is as follows: International SAS Training and Consulting This assumes that the incoming data sets are

More information

Gary L. Katsanis, Blue Cross and Blue Shield of the Rochester Area, Rochester, NY

Gary L. Katsanis, Blue Cross and Blue Shield of the Rochester Area, Rochester, NY Table Lookups in the SAS Data Step Gary L. Katsanis, Blue Cross and Blue Shield of the Rochester Area, Rochester, NY Introduction - What is a Table Lookup? You have a sales file with one observation for

More information

Table Lookups: Getting Started With Proc Format

Table Lookups: Getting Started With Proc Format Table Lookups: Getting Started With Proc Format John Cohen, AstraZeneca LP, Wilmington, DE ABSTRACT Table lookups are among the coolest tricks you can add to your SAS toolkit. Unfortunately, these techniques

More information

Table Lookups: From IF-THEN to Key-Indexing

Table Lookups: From IF-THEN to Key-Indexing Table Lookups: From IF-THEN to Key-Indexing Arthur L. Carpenter, California Occidental Consultants ABSTRACT One of the more commonly needed operations within SAS programming is to determine the value of

More information

Paper Haven't I Seen You Before? An Application of DATA Step HASH for Efficient Complex Event Associations. John Schmitz, Luminare Data LLC

Paper Haven't I Seen You Before? An Application of DATA Step HASH for Efficient Complex Event Associations. John Schmitz, Luminare Data LLC Paper 1331-2017 Haven't I Seen You Before? An Application of DATA Step HASH for Efficient Complex Event Associations ABSTRACT John Schmitz, Luminare Data LLC Data processing can sometimes require complex

More information

Choosing the Right Technique to Merge Large Data Sets Efficiently Qingfeng Liang, Community Care Behavioral Health Organization, Pittsburgh, PA

Choosing the Right Technique to Merge Large Data Sets Efficiently Qingfeng Liang, Community Care Behavioral Health Organization, Pittsburgh, PA Choosing the Right Technique to Merge Large Data Sets Efficiently Qingfeng Liang, Community Care Behavioral Health Organization, Pittsburgh, PA ABSTRACT This paper outlines different SAS merging techniques

More information

An Annotated Guide: The New 9.1, Free & Fast SPDE Data Engine Russ Lavery, Ardmore PA, Independent Contractor Ian Whitlock, Kennett Square PA

An Annotated Guide: The New 9.1, Free & Fast SPDE Data Engine Russ Lavery, Ardmore PA, Independent Contractor Ian Whitlock, Kennett Square PA An Annotated Guide: The New 9.1, Free & Fast SPDE Data Engine Russ Lavery, Ardmore PA, Independent Contractor Ian Whitlock, Kennett Square PA ABSTRACT SAS has been working hard to decrease clock time to

More information

capabilities and their overheads are therefore different.

capabilities and their overheads are therefore different. Applications Development 3 Access DB2 Tables Using Keylist Extraction Berwick Chan, Kaiser Permanente, Oakland, Calif Raymond Wan, Raymond Wan Associate Inc., Oakland, Calif Introduction The performance

More information

File Structures and Indexing

File Structures and Indexing File Structures and Indexing CPS352: Database Systems Simon Miner Gordon College Last Revised: 10/11/12 Agenda Check-in Database File Structures Indexing Database Design Tips Check-in Database File Structures

More information

If You Need These OBS and These VARS, Then Drop IF, and Keep WHERE Jay Iyengar, Data Systems Consultants LLC

If You Need These OBS and These VARS, Then Drop IF, and Keep WHERE Jay Iyengar, Data Systems Consultants LLC Paper 2417-2018 If You Need These OBS and These VARS, Then Drop IF, and Keep WHERE Jay Iyengar, Data Systems Consultants LLC ABSTRACT Reading data effectively in the DATA step requires knowing the implications

More information

Chapter 12: Indexing and Hashing. Basic Concepts

Chapter 12: Indexing and Hashing. Basic Concepts Chapter 12: Indexing and Hashing! Basic Concepts! Ordered Indices! B+-Tree Index Files! B-Tree Index Files! Static Hashing! Dynamic Hashing! Comparison of Ordered Indexing and Hashing! Index Definition

More information

Teradata. This was compiled in order to describe Teradata and provide a brief overview of common capabilities and queries.

Teradata. This was compiled in order to describe Teradata and provide a brief overview of common capabilities and queries. Teradata This was compiled in order to describe Teradata and provide a brief overview of common capabilities and queries. What is it? Teradata is a powerful Big Data tool that can be used in order to quickly

More information

Chapter 12: Indexing and Hashing

Chapter 12: Indexing and Hashing Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL

More information

Cheat sheet: Data Processing Optimization - for Pharma Analysts & Statisticians

Cheat sheet: Data Processing Optimization - for Pharma Analysts & Statisticians Cheat sheet: Data Processing Optimization - for Pharma Analysts & Statisticians ABSTRACT Karthik Chidambaram, Senior Program Director, Data Strategy, Genentech, CA This paper will provide tips and techniques

More information

Comparison of different ways using table lookups on huge tables

Comparison of different ways using table lookups on huge tables PhUSE 007 Paper CS0 Comparison of different ways using table lookups on huge tables Ralf Minkenberg, Boehringer Ingelheim Pharma GmbH & Co. KG, Ingelheim, Germany ABSTRACT In many application areas the

More information

Hash-Based Indexing 1

Hash-Based Indexing 1 Hash-Based Indexing 1 Tree Indexing Summary Static and dynamic data structures ISAM and B+ trees Speed up both range and equality searches B+ trees very widely used in practice ISAM trees can be useful

More information

Why Hash? Glen Becker, USAA

Why Hash? Glen Becker, USAA Why Hash? Glen Becker, USAA Abstract: What can I do with the new Hash object in SAS 9? Instead of focusing on How to use this new technology, this paper answers Why would I want to? It presents the Big

More information

Hash Objects for Everyone

Hash Objects for Everyone SESUG 2015 Paper BB-83 Hash Objects for Everyone Jack Hall, OptumInsight ABSTRACT The introduction of Hash Objects into the SAS toolbag gives programmers a powerful way to improve performance, especially

More information

Kathleen Durant PhD Northeastern University CS Indexes

Kathleen Durant PhD Northeastern University CS Indexes Kathleen Durant PhD Northeastern University CS 3200 Indexes Outline for the day Index definition Types of indexes B+ trees ISAM Hash index Choosing indexed fields Indexes in InnoDB 2 Indexes A typical

More information

PROC FORMAT: USE OF THE CNTLIN OPTION FOR EFFICIENT PROGRAMMING

PROC FORMAT: USE OF THE CNTLIN OPTION FOR EFFICIENT PROGRAMMING PROC FORMAT: USE OF THE CNTLIN OPTION FOR EFFICIENT PROGRAMMING Karuna Nerurkar and Andrea Robertson, GMIS Inc. ABSTRACT Proc Format can be a useful tool for improving programming efficiency. This paper

More information

Performance Considerations

Performance Considerations 149 CHAPTER 6 Performance Considerations Hardware Considerations 149 Windows Features that Optimize Performance 150 Under Windows NT 150 Under Windows NT Server Enterprise Edition 4.0 151 Processing SAS

More information

Paper CC16. William E Benjamin Jr, Owl Computer Consultancy LLC, Phoenix, AZ

Paper CC16. William E Benjamin Jr, Owl Computer Consultancy LLC, Phoenix, AZ Paper CC16 Smoke and Mirrors!!! Come See How the _INFILE_ Automatic Variable and SHAREBUFFERS Infile Option Can Speed Up Your Flat File Text-Processing Throughput Speed William E Benjamin Jr, Owl Computer

More information

CS510 Operating System Foundations. Jonathan Walpole

CS510 Operating System Foundations. Jonathan Walpole CS510 Operating System Foundations Jonathan Walpole Disk Technology & Secondary Storage Management Disk Geometry Disk head, surfaces, tracks, sectors Example Disk Characteristics Disk Surface Geometry

More information

An Animated Guide: Proc Transpose

An Animated Guide: Proc Transpose ABSTRACT An Animated Guide: Proc Transpose Russell Lavery, Independent Consultant If one can think about a SAS data set as being made up of columns and rows one can say Proc Transpose flips the columns

More information

What did we talk about last time? Finished hunters and prey Class variables Constants Class constants Started Big Oh notation

What did we talk about last time? Finished hunters and prey Class variables Constants Class constants Started Big Oh notation Week 12 - Friday What did we talk about last time? Finished hunters and prey Class variables Constants Class constants Started Big Oh notation Here is some code that sorts an array in ascending order

More information

CSCI S-Q Lecture #12 7/29/98 Data Structures and I/O

CSCI S-Q Lecture #12 7/29/98 Data Structures and I/O CSCI S-Q Lecture #12 7/29/98 Data Structures and I/O Introduction The WRITE and READ ADT Operations Case Studies: Arrays Strings Binary Trees Binary Search Trees Unordered Search Trees Page 1 Introduction

More information

Reducing SAS Dataset Merges with Data Driven Formats

Reducing SAS Dataset Merges with Data Driven Formats Paper CT01 Reducing SAS Dataset Merges with Data Driven Formats Paul Grimsey, Roche Products Ltd, Welwyn Garden City, UK ABSTRACT Merging different data sources is necessary in the creation of analysis

More information

CS 31: Intro to Systems Caching. Kevin Webb Swarthmore College March 24, 2015

CS 31: Intro to Systems Caching. Kevin Webb Swarthmore College March 24, 2015 CS 3: Intro to Systems Caching Kevin Webb Swarthmore College March 24, 205 Reading Quiz Abstraction Goal Reality: There is no one type of memory to rule them all! Abstraction: hide the complex/undesirable

More information

Lecture 12. Lecture 12: The IO Model & External Sorting

Lecture 12. Lecture 12: The IO Model & External Sorting Lecture 12 Lecture 12: The IO Model & External Sorting Lecture 12 Today s Lecture 1. The Buffer 2. External Merge Sort 2 Lecture 12 > Section 1 1. The Buffer 3 Lecture 12 > Section 1 Transition to Mechanisms

More information

SAS File Management. Improving Performance CHAPTER 37

SAS File Management. Improving Performance CHAPTER 37 519 CHAPTER 37 SAS File Management Improving Performance 519 Moving SAS Files Between Operating Environments 520 Converting SAS Files 520 Repairing Damaged Files 520 Recovering SAS Data Files 521 Recovering

More information

File System Interface and Implementation

File System Interface and Implementation Unit 8 Structure 8.1 Introduction Objectives 8.2 Concept of a File Attributes of a File Operations on Files Types of Files Structure of File 8.3 File Access Methods Sequential Access Direct Access Indexed

More information

(Refer Slide Time: 01:25)

(Refer Slide Time: 01:25) Computer Architecture Prof. Anshul Kumar Department of Computer Science and Engineering Indian Institute of Technology, Delhi Lecture - 32 Memory Hierarchy: Virtual Memory (contd.) We have discussed virtual

More information

Optimizing System Performance

Optimizing System Performance 243 CHAPTER 19 Optimizing System Performance Definitions 243 Collecting and Interpreting Performance Statistics 244 Using the FULLSTIMER and STIMER System Options 244 Interpreting FULLSTIMER and STIMER

More information

OPERATING SYSTEMS. After A.S.Tanenbaum, Modern Operating Systems 3rd edition Uses content with permission from Assoc. Prof. Florin Fortis, PhD

OPERATING SYSTEMS. After A.S.Tanenbaum, Modern Operating Systems 3rd edition Uses content with permission from Assoc. Prof. Florin Fortis, PhD OPERATING SYSTEMS #8 After A.S.Tanenbaum, Modern Operating Systems 3rd edition Uses content with permission from Assoc. Prof. Florin Fortis, PhD MEMORY MANAGEMENT MEMORY MANAGEMENT The memory is one of

More information

AVL 4 4 PDV DECLARE 7 _NEW_

AVL 4 4 PDV DECLARE 7 _NEW_ Glossary Program Control... 2 SAS Variable... 2 Program Data Vector (PDV)... 2 SAS Expression... 2 Data Type... 3 Scalar... 3 Non-Scalar... 3 Big O Notation... 3 Hash Table... 3 Hash Algorithm... 4 Hash

More information

Operating Systems Unit 6. Memory Management

Operating Systems Unit 6. Memory Management Unit 6 Memory Management Structure 6.1 Introduction Objectives 6.2 Logical versus Physical Address Space 6.3 Swapping 6.4 Contiguous Allocation Single partition Allocation Multiple Partition Allocation

More information

using and Understanding Formats

using and Understanding Formats using and Understanding SAS@ Formats Howard Levine, DynaMark, Inc. Oblectives The purpose of this paper is to enable you to use SAS formats to perform the following tasks more effectively: Improving the

More information

Getting the Most from Hash Objects. Bharath Gowda

Getting the Most from Hash Objects. Bharath Gowda Getting the Most from Hash Objects Bharath Gowda Getting the most from Hash objects Techniques covered are: SQL join Data step merge using BASE engine Data step merge using SPDE merge Index Key lookup

More information

BEYOND FORMAT BASICS 1

BEYOND FORMAT BASICS 1 BEYOND FORMAT BASICS 1 CNTLIN DATA SETS...LABELING VALUES OF VARIABLE One common use of a format in SAS is to assign labels to values of a variable. The rules for creating a format with PROC FORMAT are

More information

Hash Objects Why Bother? Barb Crowther SAS Technical Training Specialist. Copyright 2008, SAS Institute Inc. All rights reserved.

Hash Objects Why Bother? Barb Crowther SAS Technical Training Specialist. Copyright 2008, SAS Institute Inc. All rights reserved. Hash Objects Why Bother? Barb Crowther SAS Technical Training Specialist Purpose The purpose of this presentation is not to teach you how to program Hash Objects That s a two hour topic in PRG3. The purpose

More information

CSC 261/461 Database Systems Lecture 17. Fall 2017

CSC 261/461 Database Systems Lecture 17. Fall 2017 CSC 261/461 Database Systems Lecture 17 Fall 2017 Announcement Quiz 6 Due: Tonight at 11:59 pm Project 1 Milepost 3 Due: Nov 10 Project 2 Part 2 (Optional) Due: Nov 15 The IO Model & External Sorting Today

More information

Raima Database Manager Version 14.1 In-memory Database Engine

Raima Database Manager Version 14.1 In-memory Database Engine + Raima Database Manager Version 14.1 In-memory Database Engine By Jeffrey R. Parsons, Chief Engineer November 2017 Abstract Raima Database Manager (RDM) v14.1 contains an all new data storage engine optimized

More information

Contents Contents Introduction Basic Steps in Query Processing Introduction Transformation of Relational Expressions...

Contents Contents Introduction Basic Steps in Query Processing Introduction Transformation of Relational Expressions... Contents Contents...283 Introduction...283 Basic Steps in Query Processing...284 Introduction...285 Transformation of Relational Expressions...287 Equivalence Rules...289 Transformation Example: Pushing

More information

PROC FORMAT. CMS SAS User Group Conference October 31, 2007 Dan Waldo

PROC FORMAT. CMS SAS User Group Conference October 31, 2007 Dan Waldo PROC FORMAT CMS SAS User Group Conference October 31, 2007 Dan Waldo 1 Today s topic: Three uses of formats 1. To improve the user-friendliness of printed results 2. To group like data values without affecting

More information

An Introduction to Compressing Data Sets J. Meimei Ma, Quintiles

An Introduction to Compressing Data Sets J. Meimei Ma, Quintiles An Introduction to Compressing Data Sets J. Meimei Ma, Quintiles r:, INTRODUCTION This tutorial introduces compressed data sets. The SAS system compression algorithm is described along with basic syntax.

More information

T.I.P.S. (Techniques and Information for Programming in SAS )

T.I.P.S. (Techniques and Information for Programming in SAS ) Paper PO-088 T.I.P.S. (Techniques and Information for Programming in SAS ) Kathy Harkins, Carolyn Maass, Mary Anne Rutkowski Merck Research Laboratories, Upper Gwynedd, PA ABSTRACT: This paper provides

More information

Administration Naive DBMS CMPT 454 Topics. John Edgar 2

Administration Naive DBMS CMPT 454 Topics. John Edgar 2 Administration Naive DBMS CMPT 454 Topics John Edgar 2 http://www.cs.sfu.ca/coursecentral/454/johnwill/ John Edgar 4 Assignments 25% Midterm exam in class 20% Final exam 55% John Edgar 5 A database stores

More information

Preview. Memory Management

Preview. Memory Management Preview Memory Management With Mono-Process With Multi-Processes Multi-process with Fixed Partitions Modeling Multiprogramming Swapping Memory Management with Bitmaps Memory Management with Free-List Virtual

More information

Streamline Table Lookup by Embedding HASH in FCMP Qing Liu, Eli Lilly & Company, Shanghai, China

Streamline Table Lookup by Embedding HASH in FCMP Qing Liu, Eli Lilly & Company, Shanghai, China ABSTRACT PharmaSUG China 2017 - Paper 19 Streamline Table Lookup by Embedding HASH in FCMP Qing Liu, Eli Lilly & Company, Shanghai, China SAS provides many methods to perform a table lookup like Merge

More information

A Simple Framework for Sequentially Processing Hierarchical Data Sets for Large Surveys

A Simple Framework for Sequentially Processing Hierarchical Data Sets for Large Surveys A Simple Framework for Sequentially Processing Hierarchical Data Sets for Large Surveys Richard L. Downs, Jr. and Pura A. Peréz U.S. Bureau of the Census, Washington, D.C. ABSTRACT This paper explains

More information

SYSTEM 2000 Essentials

SYSTEM 2000 Essentials 7 CHAPTER 2 SYSTEM 2000 Essentials Introduction 7 SYSTEM 2000 Software 8 SYSTEM 2000 Databases 8 Database Name 9 Labeling Data 9 Grouping Data 10 Establishing Relationships between Schema Records 10 Logical

More information

Format-o-matic: Using Formats To Merge Data From Multiple Sources

Format-o-matic: Using Formats To Merge Data From Multiple Sources SESUG Paper 134-2017 Format-o-matic: Using Formats To Merge Data From Multiple Sources Marcus Maher, Ipsos Public Affairs; Joe Matise, NORC at the University of Chicago ABSTRACT User-defined formats are

More information

Chapter 17. Disk Storage, Basic File Structures, and Hashing. Records. Blocking

Chapter 17. Disk Storage, Basic File Structures, and Hashing. Records. Blocking Chapter 17 Disk Storage, Basic File Structures, and Hashing Records Fixed and variable length records Records contain fields which have values of a particular type (e.g., amount, date, time, age) Fields

More information

16 Sharing Main Memory Segmentation and Paging

16 Sharing Main Memory Segmentation and Paging Operating Systems 64 16 Sharing Main Memory Segmentation and Paging Readings for this topic: Anderson/Dahlin Chapter 8 9; Siberschatz/Galvin Chapter 8 9 Simple uniprogramming with a single segment per

More information

Hash-Based Indexes. Chapter 11

Hash-Based Indexes. Chapter 11 Hash-Based Indexes Chapter 11 1 Introduction : Hash-based Indexes Best for equality selections. Cannot support range searches. Static and dynamic hashing techniques exist: Trade-offs similar to ISAM vs.

More information

Why choose between SAS Data Step and PROC SQL when you can have both?

Why choose between SAS Data Step and PROC SQL when you can have both? Paper QT-09 Why choose between SAS Data Step and PROC SQL when you can have both? Charu Shankar, SAS Canada ABSTRACT As a SAS coder, you've often wondered what the SQL buzz is about. Or vice versa you

More information

Query Processing & Optimization

Query Processing & Optimization Query Processing & Optimization 1 Roadmap of This Lecture Overview of query processing Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Introduction

More information

9 Ways to Join Two Datasets David Franklin, Independent Consultant, New Hampshire, USA

9 Ways to Join Two Datasets David Franklin, Independent Consultant, New Hampshire, USA 9 Ways to Join Two Datasets David Franklin, Independent Consultant, New Hampshire, USA ABSTRACT Joining or merging data is one of the fundamental actions carried out when manipulating data to bring it

More information

Chapter 11: Indexing and Hashing

Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL

More information

Lecture 13. Lecture 13: B+ Tree

Lecture 13. Lecture 13: B+ Tree Lecture 13 Lecture 13: B+ Tree Lecture 13 Announcements 1. Project Part 2 extension till Friday 2. Project Part 3: B+ Tree coming out Friday 3. Poll for Nov 22nd 4. Exam Pickup: If you have questions,

More information

Updating Data Using the MODIFY Statement and the KEY= Option

Updating Data Using the MODIFY Statement and the KEY= Option Updating Data Using the MODIFY Statement and the KEY= Option Denise J. Moorman and Deanna Warner Denise J. Moorman is a technical support analyst at SAS Institute. Her area of expertise is base SAS software.

More information

BI-09 Using Enterprise Guide Effectively Tom Miron, Systems Seminar Consultants, Madison, WI

BI-09 Using Enterprise Guide Effectively Tom Miron, Systems Seminar Consultants, Madison, WI Paper BI09-2012 BI-09 Using Enterprise Guide Effectively Tom Miron, Systems Seminar Consultants, Madison, WI ABSTRACT Enterprise Guide is not just a fancy program editor! EG offers a whole new window onto

More information

File system internals Tanenbaum, Chapter 4. COMP3231 Operating Systems

File system internals Tanenbaum, Chapter 4. COMP3231 Operating Systems File system internals Tanenbaum, Chapter 4 COMP3231 Operating Systems Architecture of the OS storage stack Application File system: Hides physical location of data on the disk Exposes: directory hierarchy,

More information

Ext3/4 file systems. Don Porter CSE 506

Ext3/4 file systems. Don Porter CSE 506 Ext3/4 file systems Don Porter CSE 506 Logical Diagram Binary Formats Memory Allocators System Calls Threads User Today s Lecture Kernel RCU File System Networking Sync Memory Management Device Drivers

More information

Typical File Extensions File Structure

Typical File Extensions File Structure CS 355 Operating Systems File Systems File Systems A file is a collection of data records grouped together for purpose of access control and modification A file system is software responsible for creating,

More information

The inner workings of the datastep. By Mathieu Gaouette Videotron

The inner workings of the datastep. By Mathieu Gaouette Videotron The inner workings of the datastep By Mathieu Gaouette Videotron Plan Introduction The base The base behind the scene Control in the datastep A side by side compare with Proc SQL Introduction Most of you

More information

CMU MSP : SAS FORMATs and INFORMATs Howard Seltman Nov. 7+12, 2018

CMU MSP : SAS FORMATs and INFORMATs Howard Seltman Nov. 7+12, 2018 CMU MSP 36-601: SAS FORMATs and INFORMATs Howard Seltman Nov. 7+12, 2018 1) Formats and informats flexibly re-represent data in a data set on input or output. Common uses include reading and writing dates,

More information

Essentials of PDV: Directing the Aim to Understanding the DATA Step! Arthur Xuejun Li, City of Hope National Medical Center, Duarte, CA

Essentials of PDV: Directing the Aim to Understanding the DATA Step! Arthur Xuejun Li, City of Hope National Medical Center, Duarte, CA PharmaSUG 2013 - Paper TF17 Essentials of PDV: Directing the Aim to Understanding the DATA Step! Arthur Xuejun Li, City of Hope National Medical Center, Duarte, CA ABSTRACT Beginning programmers often

More information

Merge Sort Roberto Hibbler Dept. of Computer Science Florida Institute of Technology Melbourne, FL

Merge Sort Roberto Hibbler Dept. of Computer Science Florida Institute of Technology Melbourne, FL Merge Sort Roberto Hibbler Dept. of Computer Science Florida Institute of Technology Melbourne, FL 32901 rhibbler@cs.fit.edu ABSTRACT Given an array of elements, we want to arrange those elements into

More information

Unlock SAS Code Automation with the Power of Macros

Unlock SAS Code Automation with the Power of Macros SESUG 2015 ABSTRACT Paper AD-87 Unlock SAS Code Automation with the Power of Macros William Gui Zupko II, Federal Law Enforcement Training Centers SAS code, like any computer programming code, seems to

More information

How TokuDB Fractal TreeTM. Indexes Work. Bradley C. Kuszmaul. MySQL UC 2010 How Fractal Trees Work 1

How TokuDB Fractal TreeTM. Indexes Work. Bradley C. Kuszmaul. MySQL UC 2010 How Fractal Trees Work 1 MySQL UC 2010 How Fractal Trees Work 1 How TokuDB Fractal TreeTM Indexes Work Bradley C. Kuszmaul MySQL UC 2010 How Fractal Trees Work 2 More Information You can download this talk and others at http://tokutek.com/technology

More information

Version 8 Base SAS Performance: How Does It Stack-Up? Robert Ray, SAS Institute Inc, Cary, NC

Version 8 Base SAS Performance: How Does It Stack-Up? Robert Ray, SAS Institute Inc, Cary, NC Paper 9-25 Version 8 Base SAS Performance: How Does It Stack-Up? Robert Ray, SAS Institute Inc, Cary, NC ABSTRACT This paper presents the results of a study conducted at SAS Institute Inc to compare the

More information

How Oracle Essbase Aggregate Storage Option. And How to. Dan Pressman

How Oracle Essbase Aggregate Storage Option. And How to. Dan Pressman How Oracle Essbase Aggregate Storage Option And How to Dan Pressman San Francisco, CA October 1, 2012 Assumption, Basis and a Caveat Assumption: Basic understanding of ASO cubes Basis: My chapter How ASO

More information

Tips & Tricks. With lots of help from other SUG and SUGI presenters. SAS HUG Meeting, November 18, 2010

Tips & Tricks. With lots of help from other SUG and SUGI presenters. SAS HUG Meeting, November 18, 2010 Tips & Tricks With lots of help from other SUG and SUGI presenters 1 SAS HUG Meeting, November 18, 2010 2 3 Sorting Threads Multi-threading available if your computer has more than one processor (CPU)

More information

Chapter 6 Memory 11/3/2015. Chapter 6 Objectives. 6.2 Types of Memory. 6.1 Introduction

Chapter 6 Memory 11/3/2015. Chapter 6 Objectives. 6.2 Types of Memory. 6.1 Introduction Chapter 6 Objectives Chapter 6 Memory Master the concepts of hierarchical memory organization. Understand how each level of memory contributes to system performance, and how the performance is measured.

More information

CS 31: Intro to Systems Caching. Martin Gagne Swarthmore College March 23, 2017

CS 31: Intro to Systems Caching. Martin Gagne Swarthmore College March 23, 2017 CS 1: Intro to Systems Caching Martin Gagne Swarthmore College March 2, 2017 Recall A cache is a smaller, faster memory, that holds a subset of a larger (slower) memory We take advantage of locality to

More information

From Manual to Automatic with Overdrive - Using SAS to Automate Report Generation Faron Kincheloe, Baylor University, Waco, TX

From Manual to Automatic with Overdrive - Using SAS to Automate Report Generation Faron Kincheloe, Baylor University, Waco, TX Paper 152-27 From Manual to Automatic with Overdrive - Using SAS to Automate Report Generation Faron Kincheloe, Baylor University, Waco, TX ABSTRACT This paper is a case study of how SAS products were

More information

File Management. Marc s s first try, Please don t t sue me.

File Management. Marc s s first try, Please don t t sue me. File Management Marc s s first try, Please don t t sue me. Introduction Files Long-term existence Can be temporally decoupled from applications Sharable between processes Can be structured to the task

More information

Are you Still Afraid of Using Arrays? Let s Explore their Advantages

Are you Still Afraid of Using Arrays? Let s Explore their Advantages Paper CT07 Are you Still Afraid of Using Arrays? Let s Explore their Advantages Vladyslav Khudov, Experis Clinical, Kharkiv, Ukraine ABSTRACT At first glance, arrays in SAS seem to be a complicated and

More information

CS5460: Operating Systems Lecture 20: File System Reliability

CS5460: Operating Systems Lecture 20: File System Reliability CS5460: Operating Systems Lecture 20: File System Reliability File System Optimizations Modern Historic Technique Disk buffer cache Aggregated disk I/O Prefetching Disk head scheduling Disk interleaving

More information

SAS Data View and Engine Processing. Defining a SAS Data View. Advantages of SAS Data Views SAS DATA VIEWS: A VIRTUAL VIEW OF DATA

SAS Data View and Engine Processing. Defining a SAS Data View. Advantages of SAS Data Views SAS DATA VIEWS: A VIRTUAL VIEW OF DATA SAS DATA VIEWS: A VIRTUAL VIEW OF DATA John C. Boling SAS Institute Inc., Cary, NC Abstract The concept of a SAS data set has been extended or broadened in Version 6 of the SAS System. Two SAS file structures

More information

Lecture 12. Lecture 12: Access Methods

Lecture 12. Lecture 12: Access Methods Lecture 12 Lecture 12: Access Methods Lecture 12 If you don t find it in the index, look very carefully through the entire catalog - Sears, Roebuck and Co., Consumers Guide, 1897 2 Lecture 12 > Section

More information

Motivation. Operating Systems. File Systems. Outline. Files: The User s Point of View. File System Concepts. Solution? Files!

Motivation. Operating Systems. File Systems. Outline. Files: The User s Point of View. File System Concepts. Solution? Files! Motivation Operating Systems Process store, retrieve information Process capacity restricted to vmem size When process terminates, memory lost Multiple processes share information Systems (Ch 0.-0.4, Ch.-.5)

More information

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture)

EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture) EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture) Dept. of Computer Science & Engineering Chentao Wu wuct@cs.sjtu.edu.cn Download lectures ftp://public.sjtu.edu.cn User:

More information

Chapter 6: Modifying and Combining Data Sets

Chapter 6: Modifying and Combining Data Sets Chapter 6: Modifying and Combining Data Sets The SET statement is a powerful statement in the DATA step. Its main use is to read in a previously created SAS data set which can be modified and saved as

More information

Scalable Access to SAS Data Billy Clifford, SAS Institute Inc., Austin, TX

Scalable Access to SAS Data Billy Clifford, SAS Institute Inc., Austin, TX Scalable Access to SAS Data Billy Clifford, SAS Institute Inc., Austin, TX ABSTRACT Symmetric multiprocessor (SMP) computers can increase performance by reducing the time required to analyze large volumes

More information

Chapter 17: Distributed Systems (DS)

Chapter 17: Distributed Systems (DS) Chapter 17: Distributed Systems (DS) Silberschatz, Galvin and Gagne 2013 Chapter 17: Distributed Systems Advantages of Distributed Systems Types of Network-Based Operating Systems Network Structure Communication

More information

PharmaSUG Paper BB01

PharmaSUG Paper BB01 PharmaSUG 2014 - Paper BB01 Indexing: A powerful technique for improving efficiency Arun Raj Vidhyadharan, inventiv Health, Somerset, NJ Sunil Mohan Jairath, inventiv Health, Somerset, NJ ABSTRACT The

More information

UNIT III MEMORY MANAGEMENT

UNIT III MEMORY MANAGEMENT UNIT III MEMORY MANAGEMENT TOPICS TO BE COVERED 3.1 Memory management 3.2 Contiguous allocation i Partitioned memory allocation ii Fixed & variable partitioning iii Swapping iv Relocation v Protection

More information

Indexing. Chapter 8, 10, 11. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Indexing. Chapter 8, 10, 11. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Indexing Chapter 8, 10, 11 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Tree-Based Indexing The data entries are arranged in sorted order by search key value. A hierarchical search

More information

Introduction to File Structures

Introduction to File Structures 1 Introduction to File Structures Introduction to File Organization Data processing from a computer science perspective: Storage of data Organization of data Access to data This will be built on your knowledge

More information

Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA

Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA Indexing and Compressing SAS Data Sets: How, Why, and Why Not Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA Many users of SAS System software, especially those working

More information

CSE380 - Operating Systems. Communicating with Devices

CSE380 - Operating Systems. Communicating with Devices CSE380 - Operating Systems Notes for Lecture 15-11/4/04 Matt Blaze (some examples by Insup Lee) Communicating with Devices Modern architectures support convenient communication with devices memory mapped

More information

CS125 : Introduction to Computer Science. Lecture Notes #38 and #39 Quicksort. c 2005, 2003, 2002, 2000 Jason Zych

CS125 : Introduction to Computer Science. Lecture Notes #38 and #39 Quicksort. c 2005, 2003, 2002, 2000 Jason Zych CS125 : Introduction to Computer Science Lecture Notes #38 and #39 Quicksort c 2005, 2003, 2002, 2000 Jason Zych 1 Lectures 38 and 39 : Quicksort Quicksort is the best sorting algorithm known which is

More information

Using Data Set Options in PROC SQL Kenneth W. Borowiak Howard M. Proskin & Associates, Inc., Rochester, NY

Using Data Set Options in PROC SQL Kenneth W. Borowiak Howard M. Proskin & Associates, Inc., Rochester, NY Using Data Set Options in PROC SQL Kenneth W. Borowiak Howard M. Proskin & Associates, Inc., Rochester, NY ABSTRACT Data set options are an often over-looked feature when querying and manipulating SAS

More information

Stephen M. Beatrous, SAS Institute Inc., Cary, NC John T. Stokes, SAS Institute Inc., Austin, TX

Stephen M. Beatrous, SAS Institute Inc., Cary, NC John T. Stokes, SAS Institute Inc., Austin, TX 1/0 Performance Improvements in Release 6.07 of the SAS System under MVS, ems, and VMS' Stephen M. Beatrous, SAS Institute Inc., Cary, NC John T. Stokes, SAS Institute Inc., Austin, TX INTRODUCTION The

More information

Overview of HASH Objects Swarnalatha Gaddam, Cytel Inc. Hyderabad, India

Overview of HASH Objects Swarnalatha Gaddam, Cytel Inc. Hyderabad, India PhUSE 2014 Paper CS04 Overview of HASH Objects Swarnalatha Gaddam, Cytel Inc. Hyderabad, India Abstract: This topic is intended to provide more exposure to beginner or experienced SAS programmers who are

More information