Efficient Use of SAS' Data Set Indexes in SAS' Applications
|
|
- Collin Poole
- 6 years ago
- Views:
Transcription
1 Efficient Use of SAS' Data Set Indexes in SAS' Applications Sally Painter, SAS Institute Inc., Cary, NC ABSTRACT By indexing your SAS data sets, you can run certain types of apptications more efficiently. The ability to index SAS data sets is available in Release 6.06 of the SAS System. This paper discusses the costs associated with indexing a SAS data set and the types of applications where the benefits of having a SAS data set indexed outweigh the costs of im~menting and maintaining the index. The target audience is users who will be designing applications for Release 6.06 and later releases of the SAS System. INTRODUCTION The ability to index SAS data sets is an enhancement in Version 6 of the SAS System. Having a SAS data set indexed can, in some cases, provide faster access to a subset of the data. A secondary benefit is that all values returned via an index are returned in sorted order. However, you should not create an index just to keep from sorting the data set. In most cases, the costs of maintaining the index would more than likely outweigh the CPU reduction gained by eliminating the SORT procedure step. This paper wiu attempt to shed some light on the types of applications that will most likely benefit from the use of an index. WHAT AN INDEX IS An index is an auxiliary data structure that does not appear as a separate file to the SAS System. On all operating systems except MVS, it does appear as a separate file to the operating system. You should never manipulate the index (move, delete, and so on) except with the SAS System. Logically, the index is an inverted tree structure that contains data values and location identifiers for the values of a key variable or variables. The SAS index structure is implemented as a self-balanced tree, which means that every leaf is the same distance from the root. This is important because it provides a uniform cost to access the leaves. In addition, any change to the SAS data set that affects the data values of the indexed variables will cause the index to be modified. HOW AN INDEX IS BUILT By diagramming the logic of the index structure for a particular case, you will be able to see why values returned via the index are returned in sorted order. Also, the diagram illustrates the concept of a balanced structure. Note: The intemal structure of the index would look different. Suppose that you have a variable named REGION that you want to be the key variable for your index. The values of REGION are N. E. S. and W. The number of levels on the index structure is based on the number of unique values of the key variable. which in this case is REGION. Since REGION has only 4 values, our logical diagram of the index will contain three levels: the root (level 2), the nodes (level 1), and the leaves (level 0). Level 2 is the root of the index. There is always only one root. Level 1 are the parent nodes. Each parent contains the highest value of the REGION that lives under it. The parent node also contains a NID (child node identifier), which is the location of each child. Under each parent are two children. Each child, or leaf, contains a unique value of the key variable and the RID (record identifier). The complete set of IeveJ 0 nodes contains all values of the key vartable. Also, if you read the leaves from left to right, you will notice that the values are in sorted order. I ~ l.evei2(root) L...,---1 N,nU1 f----~i W,nU1... f---l.eveil(nodes) TYPES OF INDEXES The SAS System under Version 6 supports two types of indexes: regular and composite. A regular index is based on the value of one key variable. The name of a regular index is the same as the name of the key variable on whkfl the index is based. A composite index is based on the value of two or more key variables whose values are concatenated together to form one string. When choosing a name for a composite index, you must select a name that is different than any variables in the data set. The first value in the concatenation of a composite index can also be used by the SAS System just as if it were a regular index. For example. suppose you have created a composite index on the key variables LNAME and FNAME (in this order) and are using LNAME for BY processing in your SAS application. The SAS System could use this composite index to retrieve observations BY LNAME since it is first in the concatentation. In Release 6.06, all key variables of a composite index are used in BY processing. For example, if your composite index is composed of three key variables and you set your data set by the same three variables (using the SET and BY statements), we will retrieve the observations via the index. For WHERE dause optimization, we only use the first key variable of the composite index regardless of how many of the key variables are specified in the WHERE statement. An example of this is given later in this paper. You can create multiple regular or composite indexes on any Version 6 disk format SAS data set. You can also have any combination of regular and composite indexes. However. be aware that having a large number of indexes on any data set can be costly in terms of upkeep. 408
2 CREATING AN INDEX When creating an index, you can also specify attributes. Valid attributes are: UNIQUE NOMISS used when the key variable will not contain duplicate values in the data set. This attribute, once assigned, makes it impossible for duplicate values of the key variable to be added to the data set. This is handy when working with data that should not, for any reason, contain duplicates. used when missing values will not be included in the index, but may exist in the data set. If the NOM ISS attribute is specified, you cannot use a WHERE statement that expects to retum missing values. Also, your index structure will be physically smaller because it will not have nodes to identify the missing values. There are several ways to create an index on a SAS data set. You can: use the PROC DATASETS procedure use the Sal procedure use the IMl procedure create interactively through the ACCESS window. Note: Not all types and attributes are available with each method. The syntax for creating or deleting an index using PROC DATASETS is The CONTENTS procedure and the ACCESS window will give detailed information about the index. The DIR window will report the presence of the index, and the VAR window will report the variables used as index keys. 10U:rPUt frok PROC COIITli:RU CONn... "TIOR or THi: UD..x CONnlnS PROCBDllRB Data Set Ra.. e: LIB.ONE Kellbe. ry~: Boglne: V6G6 cuated: 8:55 liondor, Dec.llber Last K~dif1<Od: 9:3' 1I01"ldor, Dec.IIb 10, 1990 Data set rype: L.bel: -----lligll1e/il05t Dependellt InformaUon Data ht page SI.e: 61_~ IItllllber of Data Set page", 35 Indu File Page Sise, 6H~ Number of Indn rile Pagn, 36 Physical M... : SJ,SSJP.SAS6.LlBlI.All:Y Release Creat.d: 6.06 Release Last liodlfied, 6.06 created hy: SASSJP Lut Modlll.d by, APPBIID Subntent., 1 :rot&1 IIlnch Used, Alpll.h.Uc Lht of vartobl.. and Athlbutes----- Variable :rype Leo Pos IILUUB Char CREAnD Char Char " D80!!G "" Char " USCL Cbu ncfl! Cb.. TRU Cllar VOLSBR Cllar n Cbar " '" AlpbabeUc List of Indues add Attrlbut88 lrulu DBORG LIlECL RECFK Ol1servation., 3011 Varial1le5: 9 lndeus: Ol1servation Lengtll: 83 Deleted Observations: 0 Compressed: fis hus. Spa"e, Tn PROC DATASETS IN=l1bref; VOLSE!! MODIFY SALdatiLSet; L " INDEX CREATB indel.jlame I attributes; *uqulilr index* INDEX CREATE indel.jlame"variable-1ist I attributes; *corlposite indelc* INDEX DELETE indejl1lame; The syntax for creating or deleting an index using PROC Sal is PROC SQL; CREATE INDEX <UNIQUE> indel.jlame ON SAS...1!atiLS9t(key_variable(s)); DROP INDEX indellli1.jlle FROM SA5...datiLSet; Using the ACCESS window, you should follow these steps: Notice the Index File Page Size and Number of Index File Pages. This information shows the amount of DASD that the index requires. The page size is chosen by the SAS System and cannot be modified by the SAS user. A SAS System option, MSGLEVEl=l, can be set so that an information note is written to the SAS log when the index is used. type ACCESS on any SAS Display Manager System command line enter a C beside the data set that will contain the index type INDEX CREATE on the command line of the CONTENTS window enter the index name, attributes, and key variable(s} and issue RUN on the command line issue the END command to go back to display manager. CONFIRMATION OF THE INDEX Once an index has been created, the next logical step is to confirm its presence in the data set. One approach is to look for the file in the operating system file structure (valid for all systems except MVS). The choices using SAS software are to use the CONTENTS procedure and to look interactively using the ACCESS, DIR, or VAR wiooows. WHEN AN INDEX IS USED Once an index has been created, there is no way that you can force its use for WHERE optimization. That decision is left to the SAS System. However, there are some cases when an index is never used. They are FINO and SEARCH commands available with PROC FSEDIT a subsetting IF statement in the DATA step when a BY statement conflicts with a WHERE statement. For example, there is not one index that will satisfy both the BY and WHERE statements. An index will be used for BY processing if one is available. The SAS System can choose to use the index when using a WHERE statement. The costing algorithm compares the number of 1/0's it would take to retrieve the observations via a sequential pass of the 409
3 data versus the cost of retrieving the observations via the index. The most cost-efficient method is selected. Because of the design of the costing algorithm, some guidelines are suggested to help you determine whether or not to index your SAS data sets not index a data set to be used with WHERE processing if you expect to retrieve more than one-third of the total number of observations. 00 not index unless the data set occupies at least three pages (shown by PROC CONTENTS or PROC DATASETS). Keep the number of indexes per data set to a minimum. Index data sets where the values of the key variables are unifonnly distributed. Each of these suggestions is addressed in detail in the following sections. Select a Small Subset of Observations If you use a WHERE statement to select observations from a SAS data set, having your data set indexed can be an advantage if your WHERE statement selects a small number of observations from the input data set, generally one-third or less. This guideline is based on the fact that processing a data set sequentially is often more efficient when a large percentage of the data set observations are setected. For example, compare the idea of an index with the card catalog system in a library. If you are going to choose 75 percent of all the books in the library, it would be faster to walk through the shelves and gather the books than to look for each title in the card catalog and retum to the appropriate shelf multiple times. Do Not Index Small Data Sets You should not create an index unless the data set is at least three pages large. This suggestion is based on the fact that the index file will contain all the values of the key variable as well as a record identifier. With a small data set, your index file could be almost as large as the data file itself. Also, it is usually just as efficient to make a sequential pass of the data as it would be to find the appropriate node in the index tree and then retrieve the observation(s). Keep the Number of Indexes to a Minimum Once an index is created, it is automatically maintained by the SAS System. Anytime you add or delete observations from the data set, the index structure must be changed to reflect the data set changes. Also, changing a value of a key index variable in the data set requires that the node in the index structure be deleted and then a new node added to represent the new value. With multiple indexes on one data set. the resource use costs can increase dramatically. All indexes for a particular SAS data set are stored in the same index file, and the size of the file is determined by the number of indexes you have created. You should consider the size of the index file when deciding whether to create multiple indexes. Index Unifonnly Distributed Data The costing algorithm employed to determine whether to use an index determines the minimum value and the maximum value of the key variable, and then k>oks at the selection criteria (the WHERE statement specified) to detennine approximately how many observations will be selected. The algorithm assumes that the data are evenly distributed between the minimum and maximum. If this is not the case, the algorithm may decide to retrieve the observations via the index when a sequential pass of the data would in fact be more efficient AN EXAMPLE The following example illustrates some of the factors that should be considered when deciding whether to index a SAS data set. let us look at an application using a composite index in the CMS environment The application is the generation of a report using data from the automotive industry. It prepares a report of vehides tested at specific test sites across the country. You have approximately unique values of VEHICLE, vehide identification number, that are unifonnly distributed. The values of TESTSITE, city and state, are not uniformly distributed - the value CARY,NC accounts for approximately 2/3 of the data values for TESTSITE. Your final report will list about 34 observations. In this example, a composite index was created using the variables TESTSITE and VEHICLE, in this order. Next a WHERE clause was used to subset the data using these two variables as selection criteria. WHERE TESTSITE-'value1' AND VEHICl.B"'value2'; The results from this subset of the data were an increase in the VCPU and TCPU statistics (reported by the STIMER and STATS system options). VCPU, virtual CPU time, represents the CPU time spent executing within your virtual machine. TCPU, total CPU time, represents the VCPU time plus CPU time spent executing CP systems devices on behah of your job. The TCPU statistic reflects the 110 resources. To explain this decrease in performance, you must look at several factors. First, the variable TESTSITE is not a good candidate for an index since one value represents so many of the observations of the data. The variable VEHICLE is a good candidate since its data are evenly distributed. Secondly, only the first key variable in a composite index is used for WHERE clause optimization. This means that our composite index had approximately the same effect as a simple index on the variable TESTSITE. We have already stated that TESTSITE is not a good candidate for indexing. To improve performance in this example, delete the composite index and create a simple index on the most discriminating variable. In our example, this would be VEHICLE. The SAS System will then go and retrieve the observations meeting the selection criteria for VEHICLE = via the index. Next, it will sequentially process this subset looking for the appropriate values of TESTSITE. CONCLUSION Indexes can be very effective in some situations, but can actually degrade performance in others. You should evaluate your data and application carefully before you decide to index your SAS data sets. As with any performance feature, there are advantages and disadvantages to weigh. Your decision should be based on which resources are most important to conserve in your computing environment. If disk space is at a premium, then you should consider the fact that the index or indexes take extra disk space. On the other hand, if 1/0 time is important, then you should consider creating an index to reduce time to retrieve observations from your SAS data sets. Keep in mind that you can only provide the index, not force the SAS System to use it. 410
4 REFERENCES Beatrous, Stephen and William Clifford. ~Version 6 SAS" Data Base System Architecture: Current and Future Features. ft Proceedings of the Thirteenth Annual SAS Users Group International Conference. Clifford, William D. et at "Using New SAS" Database Features and Options." Proceedings of the Fourteenth Annual SAS Users Group International Conference. 411
5 412
Stephen M. Beatrous, SAS Institute Inc., Cary, NC John T. Stokes, SAS Institute Inc., Austin, TX
1/0 Performance Improvements in Release 6.07 of the SAS System under MVS, ems, and VMS' Stephen M. Beatrous, SAS Institute Inc., Cary, NC John T. Stokes, SAS Institute Inc., Austin, TX INTRODUCTION The
More informationAn Introduction to Compressing Data Sets J. Meimei Ma, Quintiles
An Introduction to Compressing Data Sets J. Meimei Ma, Quintiles r:, INTRODUCTION This tutorial introduces compressed data sets. The SAS system compression algorithm is described along with basic syntax.
More informationPresented at SEUGI '92 by Colin Harris,SAS Institute
Presented at SEUGI '92 by Colin Harris,SAS Institute Database Features Extend The Scope of SAS/SHARE@ Software William D. Clifford, SAS Institute Inc., Austin, TX ABSTRACT The role of SAS/SHARE@ software
More informationSimple Rules to Remember When Working with Indexes
Simple Rules to Remember When Working with Indexes Kirk Paul Lafler, Software Intelligence Corporation, Spring Valley, CA Abstract SAS users are always interested in learning techniques related to improving
More informationChapter 1. Introduction to Indexes
Chapter 1 Introduction to Indexes The Index Concept 2 The Index as a SAS Performance Tool 2 Types of SAS Applications That May Benefit from Indexes 4 How SAS Indexes Are Structured 4 Types of SAS Indexes
More informationAndrew H. Karp Sierra Information Services, Inc. San Francisco, California USA
Indexing and Compressing SAS Data Sets: How, Why, and Why Not Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA Many users of SAS System software, especially those working
More informationOptimizing System Performance
243 CHAPTER 19 Optimizing System Performance Definitions 243 Collecting and Interpreting Performance Statistics 244 Using the FULLSTIMER and STIMER System Options 244 Interpreting FULLSTIMER and STIMER
More informationChapter 12: Indexing and Hashing. Basic Concepts
Chapter 12: Indexing and Hashing! Basic Concepts! Ordered Indices! B+-Tree Index Files! B-Tree Index Files! Static Hashing! Dynamic Hashing! Comparison of Ordered Indexing and Hashing! Index Definition
More informationHash-Based Indexing 165
Hash-Based Indexing 165 h 1 h 0 h 1 h 0 Next = 0 000 00 64 32 8 16 000 00 64 32 8 16 A 001 01 9 25 41 73 001 01 9 25 41 73 B 010 10 10 18 34 66 010 10 10 18 34 66 C Next = 3 011 11 11 19 D 011 11 11 19
More informationChapter 12: Query Processing. Chapter 12: Query Processing
Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Sorting Join
More informationChapter 11: Indexing and Hashing
Chapter 11: Indexing and Hashing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree
More informationChapter 12: Indexing and Hashing
Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL
More informationChapter 11: Indexing and Hashing
Chapter 11: Indexing and Hashing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree
More informationDatabase System Concepts, 6 th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use
Chapter 11: Indexing and Hashing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files Static
More informationAnnouncements. Reading Material. Recap. Today 9/17/17. Storage (contd. from Lecture 6)
CompSci 16 Intensive Computing Systems Lecture 7 Storage and Index Instructor: Sudeepa Roy Announcements HW1 deadline this week: Due on 09/21 (Thurs), 11: pm, no late days Project proposal deadline: Preliminary
More informationWhy Is This Important? Overview of Storage and Indexing. Components of a Disk. Data on External Storage. Accessing a Disk Page. Records on a Disk Page
Why Is This Important? Overview of Storage and Indexing Chapter 8 DB performance depends on time it takes to get the data from storage system and time to process Choosing the right index for faster access
More informationSYSTEM 2000 Essentials
7 CHAPTER 2 SYSTEM 2000 Essentials Introduction 7 SYSTEM 2000 Software 8 SYSTEM 2000 Databases 8 Database Name 9 Labeling Data 9 Grouping Data 10 Establishing Relationships between Schema Records 10 Logical
More informationAPPENDIX 3 Tuning Tips for Applications That Use SAS/SHARE Software
177 APPENDIX 3 Tuning Tips for Applications That Use SAS/SHARE Software Authors 178 Abstract 178 Overview 178 The SAS Data Library Model 179 How Data Flows When You Use SAS Files 179 SAS Data Files 179
More informationSAS File Management. Improving Performance CHAPTER 37
519 CHAPTER 37 SAS File Management Improving Performance 519 Moving SAS Files Between Operating Environments 520 Converting SAS Files 520 Repairing Damaged Files 520 Recovering SAS Data Files 521 Recovering
More informationCSIT5300: Advanced Database Systems
CSIT5300: Advanced Database Systems L08: B + -trees and Dynamic Hashing Dr. Kenneth LEUNG Department of Computer Science and Engineering The Hong Kong University of Science and Technology Hong Kong SAR,
More informationIntroduction to Indexing 2. Acknowledgements: Eamonn Keogh and Chotirat Ann Ratanamahatana
Introduction to Indexing 2 Acknowledgements: Eamonn Keogh and Chotirat Ann Ratanamahatana Indexed Sequential Access Method We have seen that too small or too large an index (in other words too few or too
More informationIndexing. Chapter 8, 10, 11. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1
Indexing Chapter 8, 10, 11 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Tree-Based Indexing The data entries are arranged in sorted order by search key value. A hierarchical search
More informationChapter 17 Indexing Structures for Files and Physical Database Design
Chapter 17 Indexing Structures for Files and Physical Database Design We assume that a file already exists with some primary organization unordered, ordered or hash. The index provides alternate ways to
More informationStorage and Indexing
CompSci 516 Data Intensive Computing Systems Lecture 5 Storage and Indexing Instructor: Sudeepa Roy Duke CS, Spring 2016 CompSci 516: Data Intensive Computing Systems 1 Announcement Homework 1 Due on Feb
More informationChapter 12: Indexing and Hashing
Chapter 12: Indexing and Hashing Database System Concepts, 5th Ed. See www.db-book.com for conditions on re-use Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree
More informationPharmaSUG Paper BB01
PharmaSUG 2014 - Paper BB01 Indexing: A powerful technique for improving efficiency Arun Raj Vidhyadharan, inventiv Health, Somerset, NJ Sunil Mohan Jairath, inventiv Health, Somerset, NJ ABSTRACT The
More informationPhysical Level of Databases: B+-Trees
Physical Level of Databases: B+-Trees Adnan YAZICI Computer Engineering Department METU (Fall 2005) 1 B + -Tree Index Files l Disadvantage of indexed-sequential files: performance degrades as file grows,
More informationPerformance Considerations
149 CHAPTER 6 Performance Considerations Hardware Considerations 149 Windows Features that Optimize Performance 150 Under Windows NT 150 Under Windows NT Server Enterprise Edition 4.0 151 Processing SAS
More informationCPS352 Lecture - Indexing
Objectives: CPS352 Lecture - Indexing Last revised 2/25/2019 1. To explain motivations and conflicting goals for indexing 2. To explain different types of indexes (ordered versus hashed; clustering versus
More informationChapter 11: Indexing and Hashing
Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL
More informationChapter 13: Query Processing
Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing
More informationOverview of Storage and Indexing
Overview of Storage and Indexing Chapter 8 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Data on External Storage Disks: Can retrieve random page at fixed cost But reading several consecutive
More information! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for
Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and
More informationChapter 13: Query Processing Basic Steps in Query Processing
Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and
More informationChapter 12: Query Processing
Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Overview Chapter 12: Query Processing Measures of Query Cost Selection Operation Sorting Join
More informationUsing SAS Files. Introduction CHAPTER 5
123 CHAPTER 5 Using SAS Files Introduction 123 SAS Data Libraries 124 Accessing SAS Files 124 Advantages of Using Librefs Rather than OpenVMS Logical Names 124 Assigning Librefs 124 Using the LIBNAME Statement
More informationOverview of Storage and Indexing
Overview of Storage and Indexing Yanlei Diao UMass Amherst Feb 21, 2006 Slides Courtesy of R. Ramakrishnan and J. Gehrke 1 DBMS Architecture Query Parser Query Rewriter Query Optimizer Query Executor Lock
More informationTransform & Conquer. Presorting
Transform & Conquer Definition Transform & Conquer is a general algorithm design technique which works in two stages. STAGE : (Transformation stage): The problem s instance is modified, more amenable to
More informationChapter 11: Indexing and Hashing" Chapter 11: Indexing and Hashing"
Chapter 11: Indexing and Hashing" Database System Concepts, 6 th Ed.! Silberschatz, Korth and Sudarshan See www.db-book.com for conditions on re-use " Chapter 11: Indexing and Hashing" Basic Concepts!
More informationPhysical Database Design: Outline
Physical Database Design: Outline File Organization Fixed size records Variable size records Mapping Records to Files Heap Sequentially Hashing Clustered Buffer Management Indexes (Trees and Hashing) Single-level
More informationSAS Data View and Engine Processing. Defining a SAS Data View. Advantages of SAS Data Views SAS DATA VIEWS: A VIRTUAL VIEW OF DATA
SAS DATA VIEWS: A VIRTUAL VIEW OF DATA John C. Boling SAS Institute Inc., Cary, NC Abstract The concept of a SAS data set has been extended or broadened in Version 6 of the SAS System. Two SAS file structures
More informationB.H.GARDI COLLEGE OF MASTER OF COMPUTER APPLICATION. Ch. 1 :- Introduction Database Management System - 1
Basic Concepts :- 1. What is Data? Data is a collection of facts from which conclusion may be drawn. In computer science, data is anything in a form suitable for use with a computer. Data is often distinguished
More informationSAS/FSP 9.2. Procedures Guide
SAS/FSP 9.2 Procedures Guide The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2008. SAS/FSP 9.2 Procedures Guide. Cary, NC: SAS Institute Inc. SAS/FSP 9.2 Procedures
More informationOrdered Indices To gain fast random access to records in a file, we can use an index structure. Each index structure is associated with a particular search key. Just like index of a book, library catalog,
More informationChapter 11: Indexing and Hashing
Chapter 11: Indexing and Hashing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree
More informationIndexing: B + -Tree. CS 377: Database Systems
Indexing: B + -Tree CS 377: Database Systems Recap: Indexes Data structures that organize records via trees or hashing Speed up search for a subset of records based on values in a certain field (search
More informationDatabase System Concepts
Chapter 13: Query Processing s Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2008/2009 Slides (fortemente) baseados nos slides oficiais do livro c Silberschatz, Korth
More informationIntro to DB CHAPTER 12 INDEXING & HASHING
Intro to DB CHAPTER 12 INDEXING & HASHING Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing
More informationCOMP 430 Intro. to Database Systems. Indexing
COMP 430 Intro. to Database Systems Indexing How does DB find records quickly? Various forms of indexing An index is automatically created for primary key. SQL gives us some control, so we should understand
More information7. Query Processing and Optimization
7. Query Processing and Optimization Processing a Query 103 Indexing for Performance Simple (individual) index B + -tree index Matching index scan vs nonmatching index scan Unique index one entry and one
More informationSingle Record and Range Search
Database Indexing 8 Single Record and Range Search Single record retrieval: Find student name whose Age = 20 Range queries: Find all students with Grade > 8.50 Sequentially scanning of file is costly If
More informationSummarizing Impossibly Large SAS Data Sets For the Data Warehouse Server Using Horizontal Summarization
Summarizing Impossibly Large SAS Data Sets For the Data Warehouse Server Using Horizontal Summarization Michael A. Raithel, Raithel Consulting Services Abstract Data warehouse applications thrive on pre-summarized
More informationUSING PROC SQL EFFECTIVELY WITH SAS DATA SETS JIM DEFOOR LOCKHEED FORT WORTH COMPANY
USING PROC SQL EFFECTIVELY WITH SAS DATA SETS JIM DEFOOR LOCKHEED FORT WORTH COMPANY INTRODUCTION This paper is a beginning tutorial on reading and reporting Indexed SAS Data Sets with PROC SQL. Its examples
More informationIndexing and Hashing
C H A P T E R 1 Indexing and Hashing This chapter covers indexing techniques ranging from the most basic one to highly specialized ones. Due to the extensive use of indices in database systems, this chapter
More informationReview of Storage and Indexing
Review of Storage and Indexing CMPSCI 591Q Sep 17, 2007 Slides adapted from those of R. Ramakrishnan and J. Gehrke 1 File organizations & access methods Many alternatives exist, each ideal for some situations,
More informationIndexing Methods. Lecture 9. Storage Requirements of Databases
Indexing Methods Lecture 9 Storage Requirements of Databases Need data to be stored permanently or persistently for long periods of time Usually too big to fit in main memory Low cost of storage per unit
More informationOverview of Storage and Indexing
Overview of Storage and Indexing Chapter 8 Instructor: Vladimir Zadorozhny vladimir@sis.pitt.edu Information Science Program School of Information Sciences, University of Pittsburgh 1 Data on External
More informationKathleen Durant PhD Northeastern University CS Indexes
Kathleen Durant PhD Northeastern University CS 3200 Indexes Outline for the day Index definition Types of indexes B+ trees ISAM Hash index Choosing indexed fields Indexes in InnoDB 2 Indexes A typical
More informationDatabase System Concepts, 5th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use
Chapter 12: Indexing and Hashing Database System Concepts, 5th Ed. See www.db-book.com for conditions on re-use Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree
More informationOverview of Storage and Indexing
Overview of Storage and Indexing Chapter 8 How index-learning turns no student pale Yet holds the eel of science by the tail. -- Alexander Pope (1688-1744) Database Management Systems 3ed, R. Ramakrishnan
More information14 Index selection guidelines 12/08/17 11:42 PM. Index selection guidelines
Index selection guidelines 1 To use an index or not to use an index? Main principle Do not build index unless some query (including the query components of updates and deletions) benefits from it Selectivity
More informationLecture 13. Lecture 13: B+ Tree
Lecture 13 Lecture 13: B+ Tree Lecture 13 Announcements 1. Project Part 2 extension till Friday 2. Project Part 3: B+ Tree coming out Friday 3. Poll for Nov 22nd 4. Exam Pickup: If you have questions,
More informationSAS Data Libraries. Definition CHAPTER 26
385 CHAPTER 26 SAS Data Libraries Definition 385 Library Engines 387 Library Names 388 Physical Names and Logical Names (Librefs) 388 Assigning Librefs 388 Associating and Clearing Logical Names (Librefs)
More informationLecture 8 Index (B+-Tree and Hash)
CompSci 516 Data Intensive Computing Systems Lecture 8 Index (B+-Tree and Hash) Instructor: Sudeepa Roy Duke CS, Fall 2017 CompSci 516: Database Systems 1 HW1 due tomorrow: Announcements Due on 09/21 (Thurs),
More informationUsing Data Transfer Services
103 CHAPTER 16 Using Data Transfer Services Introduction 103 Benefits of Data Transfer Services 103 Considerations for Using Data Transfer Services 104 Introduction For many applications, data transfer
More informationAdvanced Database Systems
Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed
More informationOverview of Storage and Indexing
Overview of Storage and Indexing Chapter 8 How index-learning turns no student pale Yet holds the eel of science by the tail. -- Alexander Pope (1688-1744) Database Management Systems 3ed, R. Ramakrishnan
More informationQuery Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016
Query Processing Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Slides re-used with some modification from www.db-book.com Reference: Database System Concepts, 6 th Ed. By Silberschatz,
More informationSOS (Save Our Space) Matters of Size
SOS (Save Our Space) Matters of Size By Matthew Pearce Amadeus Software Limited 2001 Abstract Disk space is one of the most critical issues when handling large amounts of data. Large data means greater
More informationPROC FORMAT: USE OF THE CNTLIN OPTION FOR EFFICIENT PROGRAMMING
PROC FORMAT: USE OF THE CNTLIN OPTION FOR EFFICIENT PROGRAMMING Karuna Nerurkar and Andrea Robertson, GMIS Inc. ABSTRACT Proc Format can be a useful tool for improving programming efficiency. This paper
More informationCMSC424: Database Design. Instructor: Amol Deshpande
CMSC424: Database Design Instructor: Amol Deshpande amol@cs.umd.edu Databases Data Models Conceptual representa1on of the data Data Retrieval How to ask ques1ons of the database How to answer those ques1ons
More informationTHE B+ TREE INDEX. CS 564- Spring ACKs: Jignesh Patel, AnHai Doan
THE B+ TREE INDEX CS 564- Spring 2018 ACKs: Jignesh Patel, AnHai Doan WHAT IS THIS LECTURE ABOUT? The B+ tree index Basics Search/Insertion/Deletion Design & Cost 2 INDEX RECAP We have the following query:
More informationFSEDIT Procedure Windows
25 CHAPTER 4 FSEDIT Procedure Windows Overview 26 Viewing and Editing Observations 26 How the Control Level Affects Editing 27 Scrolling 28 Adding Observations 28 Entering and Editing Variable Values 28
More informationIndexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel
Indexing Week 14, Spring 2005 Edited by M. Naci Akkøk, 5.3.2004, 3.3.2005 Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel Overview Conventional indexes B-trees Hashing schemes
More informationTOP 10 (OR MORE) WAYS TO OPTIMIZE YOUR SAS CODE
TOP 10 (OR MORE) WAYS TO OPTIMIZE YOUR SAS CODE Handy Tips for the Savvy Programmer SAS PROGRAMMING BEST PRACTICES Create Readable Code Basic Coding Recommendations» Efficiently choosing data for processing»
More informationDatabase Systems External Sorting and Query Optimization. A.R. Hurson 323 CS Building
External Sorting and Query Optimization A.R. Hurson 323 CS Building External sorting When data to be sorted cannot fit into available main memory, external sorting algorithm must be applied. Naturally,
More informationChapter 12: Query Processing
Chapter 12: Query Processing Overview Catalog Information for Cost Estimation $ Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Transformation
More informationSAS Performance Tuning Strategies and Techniques
SAS Performance Tuning Strategies and Techniques Kirk Paul Lafler, Software Intelligence Corporation, Spring Valley, CA ABSTRACT As SAS Software becomes increasingly more popular, guidelines for its efficient
More informationLecture 12. Lecture 12: Access Methods
Lecture 12 Lecture 12: Access Methods Lecture 12 If you don t find it in the index, look very carefully through the entire catalog - Sears, Roebuck and Co., Consumers Guide, 1897 2 Lecture 12 > Section
More informationData on External Storage
Advanced Topics in DBMS Ch-1: Overview of Storage and Indexing By Syed khutubddin Ahmed Assistant Professor Dept. of MCA Reva Institute of Technology & mgmt. Data on External Storage Prg1 Prg2 Prg3 DBMS
More informationPaper CT-16 Manage Hierarchical or Associated Data with the RETAIN Statement Alan R. Mann, Independent Consultant, Harpers Ferry, WV
Paper CT-16 Manage Hierarchical or Associated Data with the RETAIN Statement Alan R. Mann, Independent Consultant, Harpers Ferry, WV ABSTRACT For most of the history of computing machinery, hierarchical
More informationUsing Data Set Options in PROC SQL Kenneth W. Borowiak Howard M. Proskin & Associates, Inc., Rochester, NY
Using Data Set Options in PROC SQL Kenneth W. Borowiak Howard M. Proskin & Associates, Inc., Rochester, NY ABSTRACT Data set options are an often over-looked feature when querying and manipulating SAS
More informationSAS Scalable Performance Data Server 4.3
Scalability Solution for SAS Dynamic Cluster Tables A SAS White Paper Table of Contents Introduction...1 Cluster Tables... 1 Dynamic Cluster Table Loading Benefits... 2 Commands for Creating and Undoing
More informationIn-Memory Searching. Linear Search. Binary Search. Binary Search Tree. k-d Tree. Hashing. Hash Collisions. Collision Strategies.
In-Memory Searching Linear Search Binary Search Binary Search Tree k-d Tree Hashing Hash Collisions Collision Strategies Chapter 4 Searching A second fundamental operation in Computer Science We review
More informationChapter 18 Indexing Structures for Files. Indexes as Access Paths
Chapter 18 Indexing Structures for Files Indexes as Access Paths A single-level index is an auxiliary file that makes it more efficient to search for a record in the data file. The index is usually specified
More informationOverview of Storage and Indexing. Data on External Storage
Overview of Storage and Indexing Chapter 8 How index-learning turns no student pale Yet holds the eel of science by the tail. -- Alexander Pope (1688-1744) Database Management Systems 3ed, R. Ramakrishnanand
More informationCHAPTER 7 Using Other SAS Software Products
77 CHAPTER 7 Using Other SAS Software Products Introduction 77 Using SAS DATA Step Features in SCL 78 Statements 78 Functions 79 Variables 79 Numeric Variables 79 Character Variables 79 Expressions 80
More informationINDEXES MICHAEL LIUT DEPARTMENT OF COMPUTING AND SOFTWARE MCMASTER UNIVERSITY
INDEXES MICHAEL LIUT (LIUTM@MCMASTER.CA) DEPARTMENT OF COMPUTING AND SOFTWARE MCMASTER UNIVERSITY SE 3DB3 (Slides adapted from Dr. Fei Chiang) Fall 2016 An Index 2 Data structure that organizes records
More informationCSE 530A. B+ Trees. Washington University Fall 2013
CSE 530A B+ Trees Washington University Fall 2013 B Trees A B tree is an ordered (non-binary) tree where the internal nodes can have a varying number of child nodes (within some range) B Trees When a key
More informationCSC 261/461 Database Systems Lecture 17. Fall 2017
CSC 261/461 Database Systems Lecture 17 Fall 2017 Announcement Quiz 6 Due: Tonight at 11:59 pm Project 1 Milepost 3 Due: Nov 10 Project 2 Part 2 (Optional) Due: Nov 15 The IO Model & External Sorting Today
More informationA TALE OF TWO RELEASES: BENCHMARKING THE PERFORMANCE OF THE SAS SYSTEM RELEASE 6.06 AGAINST RELEASE 5.18
A TALE OF TWO RELEASES: BENCHMARKING THE PERFORMANCE OF THE SAS SYSTEM RELEASE 6.06 AGAINST RELEASE 5.18 Michael A. Raithel, Marriott Corporation INTRODUCTION Last year, at SUGI 15, the SAS Institute formally
More informationFile Organization and Storage Structures
File Organization and Storage Structures o Storage of data File Organization and Storage Structures Primary Storage = Main Memory Fast Volatile Expensive Secondary Storage = Files in disks or tapes Non-Volatile
More informationSASe vs DB2 as a Relational DBMS for End Users: Three Corporations with Three Different Solutions Stephen C. Scott, Scott Consulting Services, Inc.
SASe vs DB2 as a Relational DBMS for End Users: Three Corporations with Three Different Solutions Stephen C. Scott, Scott Consulting Services, nc. ABSTRACT: Three corporations with different sizes and
More informationLocking SAS Data Objects
59 CHAPTER 5 Locking SAS Data Objects Introduction 59 Audience 60 About the SAS Data Hierarchy and Locking 60 The SAS Data Hierarchy 60 How SAS Data Objects Are Accessed and Used 61 Types of Locks 62 Locking
More informationDATABASE PERFORMANCE AND INDEXES. CS121: Relational Databases Fall 2017 Lecture 11
DATABASE PERFORMANCE AND INDEXES CS121: Relational Databases Fall 2017 Lecture 11 Database Performance 2 Many situations where query performance needs to be improved e.g. as data size grows, query performance
More informationTen tips for efficient SAS code
Ten tips for efficient SAS code Host Caroline Scottow Presenter Peter Hobart Managing the webinar In Listen Mode Control bar opened with the white arrow in the orange box Efficiency Overview Optimisation
More informationAnnouncements. Reading Material. Today. Different File Organizations. Selection of Indexes 9/24/17. CompSci 516: Database Systems
CompSci 516 Database Systems Lecture 9 Index Selection and External Sorting Announcements Private project threads created on piazza Please use these threads (and not emails) for all communications on your
More informationFile Structures and Indexing
File Structures and Indexing CPS352: Database Systems Simon Miner Gordon College Last Revised: 10/11/12 Agenda Check-in Database File Structures Indexing Database Design Tips Check-in Database File Structures
More informationPaper DB2 table. For a simple read of a table, SQL and DATA step operate with similar efficiency.
Paper 76-28 Comparative Efficiency of SQL and Base Code When Reading from Database Tables and Existing Data Sets Steven Feder, Federal Reserve Board, Washington, D.C. ABSTRACT In this paper we compare
More informationEMC GREENPLUM MANAGEMENT ENABLED BY AGINITY WORKBENCH
White Paper EMC GREENPLUM MANAGEMENT ENABLED BY AGINITY WORKBENCH A Detailed Review EMC SOLUTIONS GROUP Abstract This white paper discusses the features, benefits, and use of Aginity Workbench for EMC
More information