Stephen M. Beatrous, SAS Institute Inc., Cary, NC John T. Stokes, SAS Institute Inc., Austin, TX
|
|
- Janice Bryant
- 5 years ago
- Views:
Transcription
1 1/0 Performance Improvements in Release 6.07 of the SAS System under MVS, ems, and VMS' Stephen M. Beatrous, SAS Institute Inc., Cary, NC John T. Stokes, SAS Institute Inc., Austin, TX INTRODUCTION The goals of Release 6.06 of the SAS" System were to incorporate new functionality and to add an entirely new look and feel to the SAS System while continuing to support most features from Version 5. Release 6.06 was indeed more flexible and powerful than Version 5, but it was not as fast. The goal of Release 6.07 of the SAS System was improved periormance. Release 6.07 refines the powerful system introduced in Release 6.06 so that it is as fast as or faster than Version 5; that is, Release 6.07 provides all the power and flexibility of Release 6.06 without sacrificing performance. In short, with Release 6.07, SAS Institute is giving you something (advanced features) for nothing (performance comparable to Version 5). CPU Time in Seconds Figure 2 Observations length lielease -s.u! Comparison of Sequential Writes Many aspects of the SAS System have been optimized to improve its overall performance in Release This paper describes the most significant of the 110 enhancements for Release The data presented in this paper were collected under MVS. The conclusions have been verified on the SAS System under CMS and VMS. SEQUENTIAL READS AND WRITES Most SAS applications sequentially process the observations within a SAS data file. Activities such as merging files, producing reports, analyzing data, and generating graphs involve sequentially reading and writing observations in SAS data files. A major goal for Release 6.07 development was to improve the CPU performance of sequentiali/o. Release 6.07 uses the following two techniques to optimize sequential access to SAS files: streamlining the code path for sequential access setting default 110 buffer sizes to favor sequential processing. Figure 1 summarizes the impact of the streamlined code path, and the new default buffer sizes on the CPU performance of sequential reads. Figure 2 summarizes sequential writes. CPU Tim.. in Seconds Figure 1 and Figure 2 show the CPU time required to sequentially process 100,000 observations in Releases 5.18, 6.06, and For all releases, the CPU time required to process a file increases as the observation length increases. The slope of the line describing the increase, however, is greatest in Release The slope for Release 5.18 and 6.07 are roughly equivalent Release 6.07 consistently uses less CPU time than Release Streamlining Code In both Release 6.06 and Release 6.07, a SAS prooedure specifies what type of access it requires when it opens a SAS data file. In Release 6.07, when a procedure opens a file for sequential access, the SAS System uses a streamlined set of subroutines to process that file. These subroutines optimize individual reads and writes by avoiding movement of data out of 1/0 buffers bypassing unnecessary checks (for example, if you don't use the 08S= option, the SAS System no longer checks to see if each read has passed the 08S= option limit) reducing the layers of code that an observation must pass through. These streamlined techniques can greatly reduce the amount of CPU time required to process a SAS data file. Setting Default Buffer Sizes to Favor Sequential Processing Figure Ob5erVa.tions length Release Comparison of Sequential Reads Release 6.06 chose buffer (also called page) sizes to minimize the amount of wasted space within a page and to keep page sizes as small as possible. (Wasted space is free space at the end of a page that is not sufficient to hold another observation.) The Release 6.06 algorithm attempts to find the smallest page size that wastes no more than five percent of its total space. This algorithm conserves disk space and memory consumption at the expense of CPU performance. Release 6.07 chooses default buffer sizes to minimize the consumption of CPU time during sequential processing. On all three platfonns, increasing the number of observations per page decreases the amount of CPU time required to process a file sequentially. The CPU perfcrrrlnce improvements are dramatic up to a point, but then they taper off. The optimal points for the MVS, ~MS, and VMS operating systems are independent of the size of the observations. 960
2 To understand how the new page-size algorithm works, consider the following data gathered under MVS. Optimal CPU performance was achieved at 80 observations per page., Thus, the optimal page size for a file with observations 100 bytes long is 8000 bytes. The optimal page size for a file with observations 50 bytes long is 4000 bytes. Of course, the SAS System must round these optimal page sizes up to accommodate operating system constraints such as block size and small SAS System overhead. Larger page sizes can have negative consequences when memory is scarce because the larger default page size may mean you will not have enough memory to read a SAS data file in all of your appr. cations. If memory is a more valuable resource for your application than CPU time, you may want to use the BUFSIZE = option to specify a particular page size. Larger page sizes can also have negative consequences for an application that accesses data in a random pattern. One example of such an application is the use of the POINT= option in the DATA step's SET statement. Access will be random when the value of the variable specified in the POINT = option varies by a large amount from one execution of the SET statement to the next. Another example of such an application is the use of varying observation numbers on the command line of the FSEDIT procedure to update observations in a random pattern. The negative effects of large page sizes on random access applications are exaggerated in a SAS server (a SAS session executing the SERVER procedure of SASfSHARE~ software) by the large number of opens being processed concurrently. It is difficult to determine an optimum page size for an application in which data is accessed randomly. You must estimate how many observations on each file page are likely to be used while each page is in memory. If your application uses (or can be programmed to use) clusters of observations, you may be able to select a page size that groups all of the observations in a cluster on the same file page. On the other hand, if your application accesses data in no predictable pattern, a smaller page size will minimize the amount of I/O and CPU time wasted by reading the unused observations on each file page. SAS servers fall into the last category because the access pattern of an online data-entry or data-update application can be impossible to predict. The amount of 1/0 time spent reading wasted observations can be Significant in a SAS server's execution, so you should be especially sensitive to the page sizes of files accessed through SAS servers by your applications. NEW FILE FORMATS In Release 6.06 the size of a SAS data file was greater than the size of a similar file in Version 5 because the Release 6.06 file format required 12 bytes of overhead for each observation. For SA$ data files with a small record length (that is, with few variables), this 12- byte overhead could be significant. File compression was introduced as a way of minimizing the impact of larger file sizes on disk usage, but many sites were unable to absorb the extra CPU cost of compressing and decompressing the observations within a SAS data file. Because of customer concerns over both the increased size of noncompressed files and the CPU cost of reading compressed files, Release 6.07 introduces new file formats for both noncompressed and compressed files. Lean File Format for Noncompressed Files To achieve more efficient 1/0 processing by decreasing file sizes, Release 6.07 introduces a lean format for noncompressed files. The lean file format reduces the overhead associated with each observation from 12 bytes to a single bit. The bit associated with each observation in the lean format flags deleted observations. The Release 6.07 lean file format is in all cases an improvement over the format in Release As Table 1 Shows, under MVS and CMS, the new format is also an improvement over the Version 5 format. Table 1 Overhead per Observation by Release Overhead per observation All three operating systems Release of the SAS System * 6.07* 4 bytes (MVS) 4 bytes (ems) o bytes (VMS) 12 bytes 1 bit The percentage of improvement in file size from Release 6.06 to Release 6.07 varies depending on the size of the observations and on the number of observations in the file. As Table 2 illustrates, the decreased file size is most significant for files with small observations. Table 2 Effect of Observation Length on Lean File Format Number of Pages* Observation Percent Size Release 6.06 Release 6.07 Improvement The data sets contained observatioos each. and the page Size was held to a constant size of 6144 to aid comparison. All features available with the Release 6.06 file fonnat are available with the Release 6.07 lean format. Release 6.07 can create both formats. A site that is sharing data between Releases 6.06 and 6.07 will want to use the 6.06 format, but all other sites will want the enhanced 6.07 format. For information on specifying the file format you want, refer to the section Specifying a File Format, later in this paper. Note that the lean format does not apply to compressed files because the 12-byte overhead per observation is needed to manage compressed data. The 12-byte overhead per observation in a compressed SAS data file will make it possible that a compressed version of a file will be larger than a noncom pressed version of the same file. The compressed format must be able to average compressing 12 or more bytes per observation to be smaller than a noncompressed lean file. Release 6.07 software prints a note to the log when you create a compressed file that tells you how much you saved (or lost) by compressing a SAS data file. Faster Compressed Files Many users of Release 6.06 were happy with the amount of disk space saved when they compressed their files. Some users, however, were unable to use compression because of the CPU cost involved in decompressing the file every time it was read. Release 6.07 introduces a new compressed file format that decompresses faster than the Release Both compressed formats replace repeated bytes within an observation buffer with a repetition factor and a single occurrence of the byte.- The old format prefixes all compressed fields with an escape 961
3 character. The new format prefixes both noncompressed and compressed fields with a length. The two methods do not differ in the amount of time required to compress a file; however, the difference in the amount of time required to decompress a file can be substantial. The old algorithm must search all uncompressed data for the escape character that prefixes a compressed field. The new algorithm does not need to scan for an escape character because all fields begin with a length specification. The improvement you see reading compressed files will Valy depending on how many uncompressible fields an observation buffer contains and on the length of these uncompressible fields. To measure the effects of the new compressed file format, consider the four SAS data sets described in Table 3. Each of these data sets contains 50,000 SO-byte observations. The differences are in the contents of the observations. Table 3 Data Set Name ALL AVERAGE MISSING NONE Four Different Compressed Data Sets Contents of Data Set Contains almost all compressible data, including one 72-byte blank character variable. This data set also contains an B-byte noncompressible variable. Contains a mixture of compressible and noncompressible data, including a 20-byte compressible field, a 10-byte noncom pres sible field, a 10-byte compressible field, a 32-byte noncompressible field, and a numeric variable with values ranging from 1 to 50,000. Contains all missing values, including ten numeric variables with all missing values. Contains almost all noncompressible data, including one 72-byte noncompressible variable. This data set also contains an B-byte compressible variable. Table 4 shows the amount of CPU time required to decompress each of these data sets from the Release 6.06 and Release 6.07 compressed formats. Table 4 Comparison of CPU Usage while Decompressing Rles Data Set Release 6.06 Release 6.07 Percent Name Format Format Improvement ALL AVERAGE MISSING NONE The SAS data sets MISSING, ALL, and NONE illustrate extreme cases of decompression while the data set AVERAGE is an average case. In every case, decompression of the new compressed file format outperforms decompression of the Release 6.06 compressed file format. The performance improvement is most dramatic in the case with no compressible data, and it is not significant when the entire observation is compressible. All features available with the Release 6.06 compressed file format are available with the Release 6.07 compressed format. Release 6.07 can create both formats. A site that is sharing data between Releases 6.06 and 6.07 will want to use the 6.06 format, but all other sites will want the enhanced 6.07 format. For information on specifying the file format you want, refer to the next section, Specifying a File Format. Specifying a File Format Release 6.07 will produce new formats of compressed and noncompressed files. Although it uses the new formats by default, Release 6.07 can transparently read Release 6.06 formats. If you need to share SAS data files between Releases 6.06 and 6.07, you must force Release 6.07 to create file formats. Release 6.07 provides several ways of specifying the file format you want. The different methods offer you varying degrees of control. For example, your SAS Site Representative can set the default engine for the entire site, but you can specify a different default engine for your own SAS session. Table 5 shows the five ways you can specify a file format. Table 5 Specifying a File Format Option or Data Sets for which Argument Location Default Engine Is Set ENGINE~ site configuration file all data sets created at site ENGINE~ SAS invocation all data sets created during that SAS session V6061 V607 UBNAME statement all data sets in the specified library FILEFMT~ LlBNAME statement all data sets in the specified library FILEFMT~ data set option the one data set being opened For example, if a site representative wants to set the default file format for the entire site to the Release 6.06 format, he or she can add the following optidn to the site configuration file: engine:v606; Note that the format of the configuration file is system-dependent. See the SAS documentation for your operating system for details. A user who wants to set the default for a single SAS session can do so when invoking the SAS System. For instance, under MVS, the user starts the SAS System with the following command: sas options( engine:v606) Now, if the same user wants to set the default for a library to the Release 6.06 format, (s)he can do so with either of the following LlBNAME statements: libname perm '9xternal-file-name' filefmt~606; libname perm v606 'external-file-name'; Finally, a user who wants to create an individual file in the Release 6.06 format can use a SAS data set option: data perm. a (filefmt~606); NOTE: In Release 6.07, the FILEFMT= and ENGINE= options along with the name of an engine in the LlBNAME statement control the format used for new files. These options ale useful only for sites that need to read and write the same SAS data sets from both Releases 6.06 and No option is necessary for Release 6.07 to read and modify Release 6.06 data sets. 962
4 IMPROVED MEMORY USAGE IN THE DATA STEP In Release 6.06, a OAT A step that read several files required enough memory to hold the variable descriptor information for au of the files being read. Release 6.07 requires only enough memory to hold descriptors for the file with the most variables. Consequently, DATA steps that ran out of memory reading lots of files in Release 6.06 should run to completion in Release Fig~ ure 3 compares the amount of memory required 10 execute the following DATA step. This DATA step reads four SAS data sets with 1,000 variables each in Releases 5.18, 6.06, and data...null~; set ; Figure 3 Comparison of Memory Requirements kbytes 1494k *.* *., I aoo ***"'* +.,''''' I 36Sk "'*' 400 t...,' 176k "'*' I **t** *****,* * INDEXING PERFORMANCE Release Release 6.06 introduced indexes as a tool for tuning performance. In some applications, indexes have made a dramatic improvement. For details on the effects of using indexes, refer to "Effective Use of Indexes in the SAS System" in the Sixteenth Annual SAS Users Group International Conference Proceedings. In the interest of making a good thing better, Release 6.07 makes the following improvements to indexing: Indexes created by Release 6.07 are 20%-30% smaller than they were in Release Creating an index in Release 6.07 takes half the CPU time and half the 110 time compared to Release The algorithm for choosing an index for WHERE-clause optimization has been improved to take into account the BUFNO = option and file compression. Additional types of WHERE queries are optimized. Que')' Example SUBSTR functions where substr(lname,l,])~'smi' CONTAINS operations where Iname contains ('Smi') ; LIKE operations where Iname like ('%Rob_%) ; Truncated operators where lname gt: 'Sm' ; WHERE-CLAUSE PERFORMANCE Version 6 of the SAS System introduced the WHERE clause as a general method of subsetting a SAS data set. The WHERE clause is similar to the DATA step's subsetting IF statement, but it has several advantages over subsetting IF: You can use the WHERE clause outside of the DATA step with procedures on the PROC FSEDIT and PROC FSVIEW command lines in SCL programs as a data set option. The WHERE clause can be optimized with an index. The WHERE clause allows two more operators: 1 ike and contains. WHERE clauses without index optimization are not as fast as a subsetting IF statement in Release However, index-optimized WHERE clauses are generally much faster than subsetting IF statements. In Release 6.07, the unoptimized WHERE-clause performance matches that of the subsetting IF statement for most cases. Consider an example SAS data set with 500,000 observations and 1 variable: data a; do x= 1 to ; output; end; Now consider a simple and a complex query on this file: Example 1: Simple Query data...null; set a i if x'" 1; data.jlull; set a; where x" 1; Example 2: Complex Query data...null_; data...null; "t.; "t.; if x=1 where X= 1 x=3 x=3 x=5 x=5 1=7 x=7 1=9 " oc x",9 )(=11 X= 11 x=13 X= 13 x=15 ;" x,,15 ; " oc "' oc "' Table 6 compares these simple and complex queries as subsetting IF statements and as WHERE clauses in Release 6.06 and Table 6 Comparison of CPU Usage by the Subsetting IF Statement and the WHERE Clause Release 6.06 Release 6.07 Query INDEX INDEX Type IF WHERE WHERE IF WHERE WHERE simple complex For a simple query, the WHERE clause in Release 6.07 is more efficient than the subsetting IF statement. The complex query shows a lot of improvement between releases, but the WHERE clause is still slower than the subsetting IF statement for the complex query. Note that both the simple and the complex WHERE queries can be index optimized. An index-optimized WHERE clause wilt outperform a subsetting IF in all cases. In general, the WHERE clause is the recommended method for performing queries with the SAS System. For complex queries on large SAS data sets where CPU performance is critical, you may want to compare the performance of the subsetting IF statement and WHERE clause before deciding between the two. Flexibility and 963
5 {usually} better performance make the WHERE clause the better choice for most applications in Release SORTEDBY SUPPORT Release 6.07 stores a sort indicator with a SAS data file. The sort indicator expresses how the data are sorted. The SORT procedure automatically sets the SORTEDBY indicator when it finishes sorting a file. You can manually set the SORTEDBY indicator with the SORTEDBY = data set option. The sort indicator enhances the perfonnance of some applications by bypassing unnecessary sorts. ConSider an application that reads a SAS data set that is sometimes sorted. This application begins with the SORT procedure to ensure the data are sorted correctly. In Releases 5.18 and 6.06, this application incurs the overhead of sorting all the time. In Release 6.07, the SORT procedure recognizes that the file is already sorted and bypasses any unnecessary sorts. The value of the sort incicator is automatically synchronized with the data in a SAS data file. The SAS System turns off the sort indicator when you add a new observation to the SAS data file update an observation to change the value of one or more of the variables specified in the sort indicator; (updates that do not affect the sort order do not turn off the sort indicator) turn it off with the DATASETS procedure. Release 6.07 uses the sort indicator in the following situations: with the SORT procedure. PROe SORT does not sort a file that is already sorted. with certain types of Sal joins. These joins are optimized when the data are sorted. with the CONTENTS procedure. PROe CONTENTS reports the sort order. with the BY statement. The BY statement uses the sortindicator order instead of an index. CONCLUSIONS Release 6.07 of the SAS System provides all the enhancements of Release 6.06 plus additional capabilities, while matching or bettering the perfonnance of Release With Release 6.07, SAS Institute demonstrates its commitment to be on the leading edge of software technology without sacrificing performance or efficiency. REFERENCES Beatrous, Steve and Armstrong, Karen (1991), ueffective Use of Indexes in the SAS System," Proceedings of the Sixteenth Annual SAS Users Group International Conference, pp SAS and SAS/SHARE are registered trademarks of SAS Institute Inc. in the USA and other countries. " indicates USA registration. Other brand and product names are registered trademarks or trademarks of their respective companies. 964
APPENDIX 3 Tuning Tips for Applications That Use SAS/SHARE Software
177 APPENDIX 3 Tuning Tips for Applications That Use SAS/SHARE Software Authors 178 Abstract 178 Overview 178 The SAS Data Library Model 179 How Data Flows When You Use SAS Files 179 SAS Data Files 179
More informationAn Introduction to Compressing Data Sets J. Meimei Ma, Quintiles
An Introduction to Compressing Data Sets J. Meimei Ma, Quintiles r:, INTRODUCTION This tutorial introduces compressed data sets. The SAS system compression algorithm is described along with basic syntax.
More informationOptimizing System Performance
243 CHAPTER 19 Optimizing System Performance Definitions 243 Collecting and Interpreting Performance Statistics 244 Using the FULLSTIMER and STIMER System Options 244 Interpreting FULLSTIMER and STIMER
More informationSAS I/O Engines. Definition. Specifying a Different Engine. How Engines Work with SAS Files CHAPTER 36
511 CHAPTER 36 SAS I/O Engines Definition 511 Specifying a Different Engine 511 How Engines Work with SAS Files 511 Engine Characteristics 513 Read/Write Activity 513 Access Patterns 514 Levels of Locking
More informationCheat sheet: Data Processing Optimization - for Pharma Analysts & Statisticians
Cheat sheet: Data Processing Optimization - for Pharma Analysts & Statisticians ABSTRACT Karthik Chidambaram, Senior Program Director, Data Strategy, Genentech, CA This paper will provide tips and techniques
More informationSAS File Management. Improving Performance CHAPTER 37
519 CHAPTER 37 SAS File Management Improving Performance 519 Moving SAS Files Between Operating Environments 520 Converting SAS Files 520 Repairing Damaged Files 520 Recovering SAS Data Files 521 Recovering
More informationGary L. Katsanis, Blue Cross and Blue Shield of the Rochester Area, Rochester, NY
Table Lookups in the SAS Data Step Gary L. Katsanis, Blue Cross and Blue Shield of the Rochester Area, Rochester, NY Introduction - What is a Table Lookup? You have a sales file with one observation for
More informationPerformance Considerations
149 CHAPTER 6 Performance Considerations Hardware Considerations 149 Windows Features that Optimize Performance 150 Under Windows NT 150 Under Windows NT Server Enterprise Edition 4.0 151 Processing SAS
More informationBeginning Tutorials. PROC FSEDIT NEW=newfilename LIKE=oldfilename; Fig. 4 - Specifying a WHERE Clause in FSEDIT. Data Editing
Mouse Clicking Your Way Viewing and Manipulating Data with Version 8 of the SAS System Terry Fain, RAND, Santa Monica, California Cyndie Gareleck, RAND, Santa Monica, California ABSTRACT Version 8 of the
More informationSYSTEM 2000 Essentials
7 CHAPTER 2 SYSTEM 2000 Essentials Introduction 7 SYSTEM 2000 Software 8 SYSTEM 2000 Databases 8 Database Name 9 Labeling Data 9 Grouping Data 10 Establishing Relationships between Schema Records 10 Logical
More informationUsing SAS Files. Introduction CHAPTER 5
123 CHAPTER 5 Using SAS Files Introduction 123 SAS Data Libraries 124 Accessing SAS Files 124 Advantages of Using Librefs Rather than OpenVMS Logical Names 124 Assigning Librefs 124 Using the LIBNAME Statement
More informationUsing Cross-Environment Data Access (CEDA)
93 CHAPTER 13 Using Cross-Environment Data Access (CEDA) Introduction 93 Benefits of CEDA 93 Considerations for Using CEDA 93 Alternatives to Using CEDA 94 Introduction The cross-environment data access
More informationVersion 8 Base SAS Performance: How Does It Stack-Up? Robert Ray, SAS Institute Inc, Cary, NC
Paper 9-25 Version 8 Base SAS Performance: How Does It Stack-Up? Robert Ray, SAS Institute Inc, Cary, NC ABSTRACT This paper presents the results of a study conducted at SAS Institute Inc to compare the
More informationScalable Access to SAS Data Billy Clifford, SAS Institute Inc., Austin, TX
Scalable Access to SAS Data Billy Clifford, SAS Institute Inc., Austin, TX ABSTRACT Symmetric multiprocessor (SMP) computers can increase performance by reducing the time required to analyze large volumes
More informationUSING PROC SQL EFFECTIVELY WITH SAS DATA SETS JIM DEFOOR LOCKHEED FORT WORTH COMPANY
USING PROC SQL EFFECTIVELY WITH SAS DATA SETS JIM DEFOOR LOCKHEED FORT WORTH COMPANY INTRODUCTION This paper is a beginning tutorial on reading and reporting Indexed SAS Data Sets with PROC SQL. Its examples
More informationVersion 6 and Version 7: A Peaceful Co-Existence Steve Beatrous and James Holman, SAS Institute Inc., Cary, NC
Version 6 and Version 7: A Peaceful Co-Existence Steve Beatrous and James Holman, SAS Institute Inc., Cary, NC Abstract Version 7 represents a major step forward for SAS Institute and is the first release
More informationThe SERVER Procedure. Introduction. Syntax CHAPTER 8
95 CHAPTER 8 The SERVER Procedure Introduction 95 Syntax 95 Syntax Descriptions 96 Examples 101 ALLOCATE SASFILE Command 101 Syntax 101 Introduction You invoke the SERVER procedure to start a SAS/SHARE
More informationLocking SAS Data Objects
59 CHAPTER 5 Locking SAS Data Objects Introduction 59 Audience 60 About the SAS Data Hierarchy and Locking 60 The SAS Data Hierarchy 60 How SAS Data Objects Are Accessed and Used 61 Types of Locks 62 Locking
More informationcapabilities and their overheads are therefore different.
Applications Development 3 Access DB2 Tables Using Keylist Extraction Berwick Chan, Kaiser Permanente, Oakland, Calif Raymond Wan, Raymond Wan Associate Inc., Oakland, Calif Introduction The performance
More informationUsing Data Set Options in PROC SQL Kenneth W. Borowiak Howard M. Proskin & Associates, Inc., Rochester, NY
Using Data Set Options in PROC SQL Kenneth W. Borowiak Howard M. Proskin & Associates, Inc., Rochester, NY ABSTRACT Data set options are an often over-looked feature when querying and manipulating SAS
More informationAndrew H. Karp Sierra Information Services, Inc. San Francisco, California USA
Indexing and Compressing SAS Data Sets: How, Why, and Why Not Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA Many users of SAS System software, especially those working
More informationEfficient Use of SAS' Data Set Indexes in SAS' Applications
Efficient Use of SAS' Data Set Indexes in SAS' Applications Sally Painter, SAS Institute Inc., Cary, NC ABSTRACT By indexing your SAS data sets, you can run certain types of apptications more efficiently.
More informationPaper DB2 table. For a simple read of a table, SQL and DATA step operate with similar efficiency.
Paper 76-28 Comparative Efficiency of SQL and Base Code When Reading from Database Tables and Existing Data Sets Steven Feder, Federal Reserve Board, Washington, D.C. ABSTRACT In this paper we compare
More informationAn Introduction to SAS/FSP Software Terry Fain, RAND, Santa Monica, California Cyndie Gareleck, RAND, Santa Monica, California
An Introduction to SAS/FSP Software Terry Fain, RAND, Santa Monica, California Cyndie Gareleck, RAND, Santa Monica, California ABSTRACT SAS/FSP is a set of procedures used to perform full-screen interactive
More informationSAS Scalable Performance Data Server 4.3 TSM1:
: Parallel Join with Enhanced GROUP BY Processing A SAS White Paper Table of Contents Introduction...1 Parallel Join Coverage... 1 Parallel Join Execution... 1 Parallel Join Requirements... 5 Tables Types
More informationCSC 261/461 Database Systems Lecture 17. Fall 2017
CSC 261/461 Database Systems Lecture 17 Fall 2017 Announcement Quiz 6 Due: Tonight at 11:59 pm Project 1 Milepost 3 Due: Nov 10 Project 2 Part 2 (Optional) Due: Nov 15 The IO Model & External Sorting Today
More informationThe DATA Statement: Efficiency Techniques
The DATA Statement: Efficiency Techniques S. David Riba, JADE Tech, Inc., Clearwater, FL ABSTRACT One of those SAS statements that everyone learns in the first day of class, the DATA statement rarely gets
More informationSOS (Save Our Space) Matters of Size
SOS (Save Our Space) Matters of Size By Matthew Pearce Amadeus Software Limited 2001 Abstract Disk space is one of the most critical issues when handling large amounts of data. Large data means greater
More informationPresented at SEUGI '92 by Colin Harris,SAS Institute
Presented at SEUGI '92 by Colin Harris,SAS Institute Database Features Extend The Scope of SAS/SHARE@ Software William D. Clifford, SAS Institute Inc., Austin, TX ABSTRACT The role of SAS/SHARE@ software
More informationUsing SAS/SHARE More Efficiently
Using More Efficiently by Philip R Holland, Holland Numerics Ltd, UK Abstract is a very powerful product which allow concurrent access to SAS Datasets for reading and updating. However, if not used with
More informationDATABASE PERFORMANCE AND INDEXES. CS121: Relational Databases Fall 2017 Lecture 11
DATABASE PERFORMANCE AND INDEXES CS121: Relational Databases Fall 2017 Lecture 11 Database Performance 2 Many situations where query performance needs to be improved e.g. as data size grows, query performance
More informationTable Compression in Oracle9i Release2. An Oracle White Paper May 2002
Table Compression in Oracle9i Release2 An Oracle White Paper May 2002 Table Compression in Oracle9i Release2 Executive Overview...3 Introduction...3 How It works...3 What can be compressed...4 Cost and
More informationSAS PROGRAM EFFICIENCY FOR BEGINNERS. Bruce Gilsen, Federal Reserve Board
SAS PROGRAM EFFICIENCY FOR BEGINNERS Bruce Gilsen, Federal Reserve Board INTRODUCTION This paper presents simple efficiency techniques that can benefit inexperienced SAS software users on all platforms.
More informationSAS PROGRAM EFFICIENCY FOR BEGINNERS. Bruce Gilsen, Federal Reserve Board
SAS PROGRAM EFFICIENCY FOR BEGINNERS Bruce Gilsen, Federal Reserve Board INTRODUCTION This paper presents simple efficiency techniques that can benefit inexperienced SAS software users on all platforms.
More informationBruce Gilsen, Federal Reserve Board
SAS PROGRAM EFFICIENCY FOR BEGINNERS Bruce Gilsen, Federal Reserve Board INTRODUCTION This paper presents simple efficiency techniques that can benefit inexperienced SAS software users on all platforms
More informationSAS System Powers Web Measurement Solution at U S WEST
SAS System Powers Web Measurement Solution at U S WEST Bob Romero, U S WEST Communications, Technical Expert - SAS and Data Analysis Dale Hamilton, U S WEST Communications, Capacity Provisioning Process
More informationSAS/Warehouse Administrator Usage and Enhancements Terry Lewis, SAS Institute Inc., Cary, NC
SAS/Warehouse Administrator Usage and Enhancements Terry Lewis, SAS Institute Inc., Cary, NC ABSTRACT SAS/Warehouse Administrator software makes it easier to build, maintain, and access data warehouses
More information19 File Structure, Disk Scheduling
88 19 File Structure, Disk Scheduling Readings for this topic: Silberschatz et al., Chapters 10 11. File: a named collection of bytes stored on disk. From the OS standpoint, the file consists of a bunch
More informationSimple Rules to Remember When Working with Indexes
Simple Rules to Remember When Working with Indexes Kirk Paul Lafler, Software Intelligence Corporation, Spring Valley, CA Abstract SAS users are always interested in learning techniques related to improving
More informationData Set Options. Specify a data set option in parentheses after a SAS data set name. To specify several data set options, separate them with spaces.
23 CHAPTER 4 Data Set Options Definition 23 Syntax 23 Using Data Set Options 24 Using Data Set Options with Input or Output SAS Data Sets 24 How Data Set Options Interact with System Options 24 Data Set
More informationSAS Data View and Engine Processing. Defining a SAS Data View. Advantages of SAS Data Views SAS DATA VIEWS: A VIRTUAL VIEW OF DATA
SAS DATA VIEWS: A VIRTUAL VIEW OF DATA John C. Boling SAS Institute Inc., Cary, NC Abstract The concept of a SAS data set has been extended or broadened in Version 6 of the SAS System. Two SAS file structures
More informationCHAPTER 7 Using Other SAS Software Products
77 CHAPTER 7 Using Other SAS Software Products Introduction 77 Using SAS DATA Step Features in SCL 78 Statements 78 Functions 79 Variables 79 Numeric Variables 79 Character Variables 79 Expressions 80
More informationSAS ENTERPRISE GUIDE USER INTERFACE
Paper 294-2008 What s New in the 4.2 releases of SAS Enterprise Guide and the SAS Add-In for Microsoft Office I-kong Fu, Lina Clover, and Anand Chitale, SAS Institute Inc., Cary, NC ABSTRACT SAS Enterprise
More informationFSEDIT Procedure Windows
25 CHAPTER 4 FSEDIT Procedure Windows Overview 26 Viewing and Editing Observations 26 How the Control Level Affects Editing 27 Scrolling 28 Adding Observations 28 Entering and Editing Variable Values 28
More informationChecking for Duplicates Wendi L. Wright
Checking for Duplicates Wendi L. Wright ABSTRACT This introductory level paper demonstrates a quick way to find duplicates in a dataset (with both simple and complex keys). It discusses what to do when
More informationSAS Scalable Performance Data Server 4.3
Scalability Solution for SAS Dynamic Cluster Tables A SAS White Paper Table of Contents Introduction...1 Cluster Tables... 1 Dynamic Cluster Table Loading Benefits... 2 Commands for Creating and Undoing
More informationChapter 7 File Access. Chapter Table of Contents
Chapter 7 File Access Chapter Table of Contents OVERVIEW...105 REFERRING TO AN EXTERNAL FILE...105 TypesofExternalFiles...106 READING FROM AN EXTERNAL FILE...107 UsingtheINFILEStatement...107 UsingtheINPUTStatement...108
More informationSAS/FSP 9.2. Procedures Guide
SAS/FSP 9.2 Procedures Guide The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2008. SAS/FSP 9.2 Procedures Guide. Cary, NC: SAS Institute Inc. SAS/FSP 9.2 Procedures
More informationData Set Options CHAPTER 2
5 CHAPTER 2 Data Set Options Definition 6 6 Using Data Set Options 6 Using Data Set Options with Input or Output SAS Data Sets 6 How Data Set Options Interact with System Options 7 Data Set Options by
More informationBuilding a Data Warehouse with SAS Software in the Unix Environment
Building a Data Warehouse with SAS Software in the Unix Environment Karen Grippo, Dun & Bradstreet, Basking Ridge, NJ John Chen, Dun & Bradstreet, Basking Ridge, NJ Lisa Brown, SAS Institute Inc., Cary,
More informationSESUG Paper AD A SAS macro replacement for Dynamic Data Exchange (DDE) for use with SAS grid
SESUG Paper AD-109-2017 A macro replacement for Dynamic Data Exchange (DDE) for use with grid ABSTRACT Saki Kinney, David Wilson, and Benjamin Carper, RTI International The ability to write to specific
More informationEvaluating the migration of a SAS application from a VAX to a PC-based NT network Alan T. Pasquino, Pfizer, Inc. Don J. Fish, Pfizer, Inc.
Evaluating the migration of a SAS application from a VAX to a PC-based NT network Alan T. Pasquino, Pfizer, Inc. Don J. Fish, Pfizer, Inc. Abstract: Over a period of several years, we have developed an
More informationCSE 530A. B+ Trees. Washington University Fall 2013
CSE 530A B+ Trees Washington University Fall 2013 B Trees A B tree is an ordered (non-binary) tree where the internal nodes can have a varying number of child nodes (within some range) B Trees When a key
More informationBeginning Tutorials. Introduction to SAS/FSP in Version 8 Terry Fain, RAND, Santa Monica, California Cyndie Gareleck, RAND, Santa Monica, California
Introduction to SAS/FSP in Version 8 Terry Fain, RAND, Santa Monica, California Cyndie Gareleck, RAND, Santa Monica, California ABSTRACT SAS/FSP is a set of procedures used to perform full-screen interactive
More informationChapter 12: Indexing and Hashing. Basic Concepts
Chapter 12: Indexing and Hashing! Basic Concepts! Ordered Indices! B+-Tree Index Files! B-Tree Index Files! Static Hashing! Dynamic Hashing! Comparison of Ordered Indexing and Hashing! Index Definition
More informationExtreme Storage Performance with exflash DIMM and AMPS
Extreme Storage Performance with exflash DIMM and AMPS 214 by 6East Technologies, Inc. and Lenovo Corporation All trademarks or registered trademarks mentioned here are the property of their respective
More informationSAS Data Libraries. Definition CHAPTER 26
385 CHAPTER 26 SAS Data Libraries Definition 385 Library Engines 387 Library Names 388 Physical Names and Logical Names (Librefs) 388 Assigning Librefs 388 Associating and Clearing Logical Names (Librefs)
More informationSAS Performance Tuning Strategies and Techniques
SAS Performance Tuning Strategies and Techniques Kirk Paul Lafler, Software Intelligence Corporation, Spring Valley, CA ABSTRACT As SAS Software becomes increasingly more popular, guidelines for its efficient
More informationUncommon Techniques for Common Variables
Paper 11863-2016 Uncommon Techniques for Common Variables Christopher J. Bost, MDRC, New York, NY ABSTRACT If a variable occurs in more than one data set being merged, the last value (from the variable
More informationChapter 12: Indexing and Hashing
Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL
More informationDATABASE SCALABILITY AND CLUSTERING
WHITE PAPER DATABASE SCALABILITY AND CLUSTERING As application architectures become increasingly dependent on distributed communication and processing, it is extremely important to understand where the
More informationDatabase Architectures
Database Architectures CPS352: Database Systems Simon Miner Gordon College Last Revised: 4/15/15 Agenda Check-in Parallelism and Distributed Databases Technology Research Project Introduction to NoSQL
More informationBS2000/OSD DAB Disk Access Buffer Intelligent Caching with AutoDAB
BS2000/OSD DAB Disk Access Buffer Intelligent Caching with AutoDAB Issue June 2009 Pages 7 To cache or not to cache? That is not the question! Business-critical computing is typified by high performance
More informationPart 1: Indexes for Big Data
JethroData Making Interactive BI for Big Data a Reality Technical White Paper This white paper explains how JethroData can help you achieve a truly interactive interactive response time for BI on big data,
More informationA Simulation: Improving Throughput and Reducing PCI Bus Traffic by. Caching Server Requests using a Network Processor with Memory
Shawn Koch Mark Doughty ELEC 525 4/23/02 A Simulation: Improving Throughput and Reducing PCI Bus Traffic by Caching Server Requests using a Network Processor with Memory 1 Motivation and Concept The goal
More informationChapter 11: Indexing and Hashing
Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL
More informationDATA Step in SAS Viya : Essential New Features
Paper SAS118-2017 DATA Step in SAS Viya : Essential New Features Jason Secosky, SAS Institute Inc., Cary, NC ABSTRACT The is the familiar and powerful data processing language in SAS and now SAS Viya.
More informationCSE 530A. Query Planning. Washington University Fall 2013
CSE 530A Query Planning Washington University Fall 2013 Scanning When finding data in a relation, we've seen two types of scans Table scan Index scan There is a third common way Bitmap scan Bitmap Scans
More informationImproving VSAM Application Performance with IAM
Improving VSAM Application Performance with IAM Richard Morse Innovation Data Processing August 16, 2004 Session 8422 This session presents at the technical concept level, how IAM improves the performance
More informationMaximizing SAS Software Performance Under the Unix Operating System
Maximizing SAS Software Performance Under the Unix Operating System Daniel McLaren, Henry Ford Health system, Detroit, MI George W. Divine, Henry Ford Health System, Detroit, MI Abstract The Unix operating
More informationA transaction is a sequence of one or more processing steps. It refers to database objects such as tables, views, joins and so forth.
1 2 A transaction is a sequence of one or more processing steps. It refers to database objects such as tables, views, joins and so forth. Here, the following properties must be fulfilled: Indivisibility
More informationUsing Data Transfer Services
103 CHAPTER 16 Using Data Transfer Services Introduction 103 Benefits of Data Transfer Services 103 Considerations for Using Data Transfer Services 104 Introduction For many applications, data transfer
More informationAdministração e Optimização de Bases de Dados 2012/2013 Index Tuning
Administração e Optimização de Bases de Dados 2012/2013 Index Tuning Bruno Martins DEI@Técnico e DMIR@INESC-ID Index An index is a data structure that supports efficient access to data Condition on Index
More informationLecture #10 Context Switching & Performance Optimization
SPRING 2015 Integrated Technical Education Cluster At AlAmeeria E-626-A Real-Time Embedded Systems (RTES) Lecture #10 Context Switching & Performance Optimization Instructor: Dr. Ahmad El-Banna Agenda
More informationProgramming Beyond the Basics. Find() the power of Hash - How, Why and When to use the SAS Hash Object John Blackwell
Find() the power of Hash - How, Why and When to use the SAS Hash Object John Blackwell ABSTRACT The SAS hash object has come of age in SAS 9.2, giving the SAS programmer the ability to quickly do things
More informationSAS Viya 3.1 FAQ for Processing UTF-8 Data
SAS Viya 3.1 FAQ for Processing UTF-8 Data Troubleshooting Tips for Processing UTF-8 Data (Existing SAS Code) What Is the Encoding of My Data Set? PROC CONTENTS displays information about the data set
More informationDatabase Architectures
Database Architectures CPS352: Database Systems Simon Miner Gordon College Last Revised: 11/15/12 Agenda Check-in Centralized and Client-Server Models Parallelism Distributed Databases Homework 6 Check-in
More informationCharacteristics of a "Successful" Application.
Characteristics of a "Successful" Application. Caroline Bahler, Meridian Software, Inc. Abstract An application can be judged "successful" by two different sets of criteria. The first set of criteria belongs
More informationSTATION
------------------------------STATION 1------------------------------ 1. Which of the following statements displays all user-defined macro variables in the SAS log? a) %put user=; b) %put user; c) %put
More informationQuery Execution [15]
CSC 661, Principles of Database Systems Query Execution [15] Dr. Kalpakis http://www.csee.umbc.edu/~kalpakis/courses/661 Query processing involves Query processing compilation parsing to construct parse
More informationOpenVMS Operating Environment
81 CHAPTER 11 OpenVMS Operating Environment Listing OpenVMS System File Attributes 81 Specifying File Attributes for OpenVMS 82 Determining the SAS Release Used to Create a Member for OpenVMS 82 Mounting
More informationThis chapter is recommended primarily for server administrators.
27 CHAPTER 3 Starting and Managing a SAS/ SHARE Server Audience 27 Starting a Server: A Fast-Track Approach 27 Specifying a Communications Access Method 28 Pre-Defining SAS Data Libraries to the Server
More informationQUEST Procedure Reference
111 CHAPTER 9 QUEST Procedure Reference Introduction 111 QUEST Procedure Syntax 111 Description 112 PROC QUEST Statement Options 112 Procedure Statements 112 SYSTEM 2000 Statement 114 ECHO ON and ECHO
More informationExadata X3 in action: Measuring Smart Scan efficiency with AWR. Franck Pachot Senior Consultant
Exadata X3 in action: Measuring Smart Scan efficiency with AWR Franck Pachot Senior Consultant 16 March 2013 1 Exadata X3 in action: Measuring Smart Scan efficiency with AWR Exadata comes with new statistics
More informationAPPENDIX 4 Migrating from QMF to SAS/ ASSIST Software. Each of these steps can be executed independently.
255 APPENDIX 4 Migrating from QMF to SAS/ ASSIST Software Introduction 255 Generating a QMF Export Procedure 255 Exporting Queries from QMF 257 Importing QMF Queries into Query and Reporting 257 Alternate
More information%DWFK$&&(66WR $'$%$6%$$ E\ 6WXDUW%LUFK IURP,QIRUPDWLRQ'HOLYHU\ 6\VWHPV6RXWK$IULFD
%DWFK$&&(66WR $'$%$6%$$ E\ 6WXDUW%LUFK IURP,QIRUPDWLRQ'HOLYHU\ 6\VWHPV6RXWK$IULFD 1 ,QWURGXFWLRQ O Objectives and Benefits O Applicable Environment O Terms and Definitions O System Components Objectives
More informationTotalCost = 3 (1, , 000) = 6, 000
156 Chapter 12 HASH JOIN: Now both relations are the same size, so we can treat either one as the smaller relation. With 15 buffer pages the first scan of S splits it into 14 buckets, each containing about
More informationCHAPTER 7 Examples of Combining Compute Services and Data Transfer Services
55 CHAPTER 7 Examples of Combining Compute Services and Data Transfer Services Introduction 55 Example 1. Compute Services and Data Transfer Services Combined: Local and Remote Processing 56 Purpose 56
More informationFILE SYSTEMS, PART 2. CS124 Operating Systems Fall , Lecture 24
FILE SYSTEMS, PART 2 CS124 Operating Systems Fall 2017-2018, Lecture 24 2 Last Time: File Systems Introduced the concept of file systems Explored several ways of managing the contents of files Contiguous
More informationDBLOAD Procedure Reference
131 CHAPTER 10 DBLOAD Procedure Reference Introduction 131 Naming Limits in the DBLOAD Procedure 131 Case Sensitivity in the DBLOAD Procedure 132 DBLOAD Procedure 132 133 PROC DBLOAD Statement Options
More informationPROC FORMAT: USE OF THE CNTLIN OPTION FOR EFFICIENT PROGRAMMING
PROC FORMAT: USE OF THE CNTLIN OPTION FOR EFFICIENT PROGRAMMING Karuna Nerurkar and Andrea Robertson, GMIS Inc. ABSTRACT Proc Format can be a useful tool for improving programming efficiency. This paper
More informationBatch vs. Interactive: Why You Need Both Janet E. Stuelpner. ASG. Inc Cary. North Carolina
Batch vs. Interactive: Why You Need Both Janet E. Stuelpner. ASG. Inc Cary. North Carolina ABSTRACT error was small fa semi-colon was omitted or a closing quotation mark was missing), but caused the program
More informationBest Practices for Setting BIOS Parameters for Performance
White Paper Best Practices for Setting BIOS Parameters for Performance Cisco UCS E5-based M3 Servers May 2013 2014 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page
More informationBases de Dades: introduction to SQL (indexes and transactions)
Bases de Dades: introduction to SQL (indexes and transactions) Andrew D. Bagdanov bagdanov@cvc.uab.es Departamento de Ciencias de la Computación Universidad Autónoma de Barcelona Fall, 2010 Questions from
More informationFile Size Distribution on UNIX Systems Then and Now
File Size Distribution on UNIX Systems Then and Now Andrew S. Tanenbaum, Jorrit N. Herder*, Herbert Bos Dept. of Computer Science Vrije Universiteit Amsterdam, The Netherlands {ast@cs.vu.nl, jnherder@cs.vu.nl,
More informationCreating and Executing Stored Compiled DATA Step Programs
465 CHAPTER 30 Creating and Executing Stored Compiled DATA Step Programs Definition 465 Uses for Stored Compiled DATA Step Programs 465 Restrictions and Requirements 466 How SAS Processes Stored Compiled
More informationDavid Beam, Systems Seminar Consultants, Inc., Madison, WI
Paper 150-26 INTRODUCTION TO PROC SQL David Beam, Systems Seminar Consultants, Inc., Madison, WI ABSTRACT PROC SQL is a powerful Base SAS Procedure that combines the functionality of DATA and PROC steps
More informationCleaning up your SAS log: Note Messages
Paper 9541-2016 Cleaning up your SAS log: Note Messages ABSTRACT Jennifer Srivastava, Quintiles Transnational Corporation, Durham, NC As a SAS programmer, you probably spend some of your time reading and
More informationFile System Interface and Implementation
Unit 8 Structure 8.1 Introduction Objectives 8.2 Concept of a File Attributes of a File Operations on Files Types of Files Structure of File 8.3 File Access Methods Sequential Access Direct Access Indexed
More informationProblem Set 2 Solutions
6.893 Problem Set 2 Solutons 1 Problem Set 2 Solutions The problem set was worth a total of 25 points. Points are shown in parentheses before the problem. Part 1 - Warmup (5 points total) 1. We studied
More information