Efficient Use of SAS' Data Set Indexes in SAS' Applications

Size: px
Start display at page:

Download "Efficient Use of SAS' Data Set Indexes in SAS' Applications"

Transcription

1 Efficient Use of SAS' Data Set Indexes in SAS' Applications Sally Painter, SAS Institute Inc., Cary, NC ABSTRACT By indexing your SAS data sets, you can run certain types of apptications more efficiently. The ability to index SAS data sets is available in Release 6.06 of the SAS System. This paper discusses the costs associated with indexing a SAS data set and the types of applications where the benefits of having a SAS data set indexed outweigh the costs of im~menting and maintaining the index. The target audience is users who will be designing applications for Release 6.06 and later releases of the SAS System. INTRODUCTION The ability to index SAS data sets is an enhancement in Version 6 of the SAS System. Having a SAS data set indexed can, in some cases, provide faster access to a subset of the data. A secondary benefit is that all values returned via an index are returned in sorted order. However, you should not create an index just to keep from sorting the data set. In most cases, the costs of maintaining the index would more than likely outweigh the CPU reduction gained by eliminating the SORT procedure step. This paper wiu attempt to shed some light on the types of applications that will most likely benefit from the use of an index. WHAT AN INDEX IS An index is an auxiliary data structure that does not appear as a separate file to the SAS System. On all operating systems except MVS, it does appear as a separate file to the operating system. You should never manipulate the index (move, delete, and so on) except with the SAS System. Logically, the index is an inverted tree structure that contains data values and location identifiers for the values of a key variable or variables. The SAS index structure is implemented as a self-balanced tree, which means that every leaf is the same distance from the root. This is important because it provides a uniform cost to access the leaves. In addition, any change to the SAS data set that affects the data values of the indexed variables will cause the index to be modified. HOW AN INDEX IS BUILT By diagramming the logic of the index structure for a particular case, you will be able to see why values returned via the index are returned in sorted order. Also, the diagram illustrates the concept of a balanced structure. Note: The intemal structure of the index would look different. Suppose that you have a variable named REGION that you want to be the key variable for your index. The values of REGION are N. E. S. and W. The number of levels on the index structure is based on the number of unique values of the key variable. which in this case is REGION. Since REGION has only 4 values, our logical diagram of the index will contain three levels: the root (level 2), the nodes (level 1), and the leaves (level 0). Level 2 is the root of the index. There is always only one root. Level 1 are the parent nodes. Each parent contains the highest value of the REGION that lives under it. The parent node also contains a NID (child node identifier), which is the location of each child. Under each parent are two children. Each child, or leaf, contains a unique value of the key variable and the RID (record identifier). The complete set of IeveJ 0 nodes contains all values of the key vartable. Also, if you read the leaves from left to right, you will notice that the values are in sorted order. I ~ l.evei2(root) L...,---1 N,nU1 f----~i W,nU1... f---l.eveil(nodes) TYPES OF INDEXES The SAS System under Version 6 supports two types of indexes: regular and composite. A regular index is based on the value of one key variable. The name of a regular index is the same as the name of the key variable on whkfl the index is based. A composite index is based on the value of two or more key variables whose values are concatenated together to form one string. When choosing a name for a composite index, you must select a name that is different than any variables in the data set. The first value in the concatenation of a composite index can also be used by the SAS System just as if it were a regular index. For example. suppose you have created a composite index on the key variables LNAME and FNAME (in this order) and are using LNAME for BY processing in your SAS application. The SAS System could use this composite index to retrieve observations BY LNAME since it is first in the concatentation. In Release 6.06, all key variables of a composite index are used in BY processing. For example, if your composite index is composed of three key variables and you set your data set by the same three variables (using the SET and BY statements), we will retrieve the observations via the index. For WHERE dause optimization, we only use the first key variable of the composite index regardless of how many of the key variables are specified in the WHERE statement. An example of this is given later in this paper. You can create multiple regular or composite indexes on any Version 6 disk format SAS data set. You can also have any combination of regular and composite indexes. However. be aware that having a large number of indexes on any data set can be costly in terms of upkeep. 408

2 CREATING AN INDEX When creating an index, you can also specify attributes. Valid attributes are: UNIQUE NOMISS used when the key variable will not contain duplicate values in the data set. This attribute, once assigned, makes it impossible for duplicate values of the key variable to be added to the data set. This is handy when working with data that should not, for any reason, contain duplicates. used when missing values will not be included in the index, but may exist in the data set. If the NOM ISS attribute is specified, you cannot use a WHERE statement that expects to retum missing values. Also, your index structure will be physically smaller because it will not have nodes to identify the missing values. There are several ways to create an index on a SAS data set. You can: use the PROC DATASETS procedure use the Sal procedure use the IMl procedure create interactively through the ACCESS window. Note: Not all types and attributes are available with each method. The syntax for creating or deleting an index using PROC DATASETS is The CONTENTS procedure and the ACCESS window will give detailed information about the index. The DIR window will report the presence of the index, and the VAR window will report the variables used as index keys. 10U:rPUt frok PROC COIITli:RU CONn... "TIOR or THi: UD..x CONnlnS PROCBDllRB Data Set Ra.. e: LIB.ONE Kellbe. ry~: Boglne: V6G6 cuated: 8:55 liondor, Dec.llber Last K~dif1<Od: 9:3' 1I01"ldor, Dec.IIb 10, 1990 Data set rype: L.bel: -----lligll1e/il05t Dependellt InformaUon Data ht page SI.e: 61_~ IItllllber of Data Set page", 35 Indu File Page Sise, 6H~ Number of Indn rile Pagn, 36 Physical M... : SJ,SSJP.SAS6.LlBlI.All:Y Release Creat.d: 6.06 Release Last liodlfied, 6.06 created hy: SASSJP Lut Modlll.d by, APPBIID Subntent., 1 :rot&1 IIlnch Used, Alpll.h.Uc Lht of vartobl.. and Athlbutes----- Variable :rype Leo Pos IILUUB Char CREAnD Char Char " D80!!G "" Char " USCL Cbu ncfl! Cb.. TRU Cllar VOLSBR Cllar n Cbar " '" AlpbabeUc List of Indues add Attrlbut88 lrulu DBORG LIlECL RECFK Ol1servation., 3011 Varial1le5: 9 lndeus: Ol1servation Lengtll: 83 Deleted Observations: 0 Compressed: fis hus. Spa"e, Tn PROC DATASETS IN=l1bref; VOLSE!! MODIFY SALdatiLSet; L " INDEX CREATB indel.jlame I attributes; *uqulilr index* INDEX CREATE indel.jlame"variable-1ist I attributes; *corlposite indelc* INDEX DELETE indejl1lame; The syntax for creating or deleting an index using PROC Sal is PROC SQL; CREATE INDEX <UNIQUE> indel.jlame ON SAS...1!atiLS9t(key_variable(s)); DROP INDEX indellli1.jlle FROM SA5...datiLSet; Using the ACCESS window, you should follow these steps: Notice the Index File Page Size and Number of Index File Pages. This information shows the amount of DASD that the index requires. The page size is chosen by the SAS System and cannot be modified by the SAS user. A SAS System option, MSGLEVEl=l, can be set so that an information note is written to the SAS log when the index is used. type ACCESS on any SAS Display Manager System command line enter a C beside the data set that will contain the index type INDEX CREATE on the command line of the CONTENTS window enter the index name, attributes, and key variable(s} and issue RUN on the command line issue the END command to go back to display manager. CONFIRMATION OF THE INDEX Once an index has been created, the next logical step is to confirm its presence in the data set. One approach is to look for the file in the operating system file structure (valid for all systems except MVS). The choices using SAS software are to use the CONTENTS procedure and to look interactively using the ACCESS, DIR, or VAR wiooows. WHEN AN INDEX IS USED Once an index has been created, there is no way that you can force its use for WHERE optimization. That decision is left to the SAS System. However, there are some cases when an index is never used. They are FINO and SEARCH commands available with PROC FSEDIT a subsetting IF statement in the DATA step when a BY statement conflicts with a WHERE statement. For example, there is not one index that will satisfy both the BY and WHERE statements. An index will be used for BY processing if one is available. The SAS System can choose to use the index when using a WHERE statement. The costing algorithm compares the number of 1/0's it would take to retrieve the observations via a sequential pass of the 409

3 data versus the cost of retrieving the observations via the index. The most cost-efficient method is selected. Because of the design of the costing algorithm, some guidelines are suggested to help you determine whether or not to index your SAS data sets not index a data set to be used with WHERE processing if you expect to retrieve more than one-third of the total number of observations. 00 not index unless the data set occupies at least three pages (shown by PROC CONTENTS or PROC DATASETS). Keep the number of indexes per data set to a minimum. Index data sets where the values of the key variables are unifonnly distributed. Each of these suggestions is addressed in detail in the following sections. Select a Small Subset of Observations If you use a WHERE statement to select observations from a SAS data set, having your data set indexed can be an advantage if your WHERE statement selects a small number of observations from the input data set, generally one-third or less. This guideline is based on the fact that processing a data set sequentially is often more efficient when a large percentage of the data set observations are setected. For example, compare the idea of an index with the card catalog system in a library. If you are going to choose 75 percent of all the books in the library, it would be faster to walk through the shelves and gather the books than to look for each title in the card catalog and retum to the appropriate shelf multiple times. Do Not Index Small Data Sets You should not create an index unless the data set is at least three pages large. This suggestion is based on the fact that the index file will contain all the values of the key variable as well as a record identifier. With a small data set, your index file could be almost as large as the data file itself. Also, it is usually just as efficient to make a sequential pass of the data as it would be to find the appropriate node in the index tree and then retrieve the observation(s). Keep the Number of Indexes to a Minimum Once an index is created, it is automatically maintained by the SAS System. Anytime you add or delete observations from the data set, the index structure must be changed to reflect the data set changes. Also, changing a value of a key index variable in the data set requires that the node in the index structure be deleted and then a new node added to represent the new value. With multiple indexes on one data set. the resource use costs can increase dramatically. All indexes for a particular SAS data set are stored in the same index file, and the size of the file is determined by the number of indexes you have created. You should consider the size of the index file when deciding whether to create multiple indexes. Index Unifonnly Distributed Data The costing algorithm employed to determine whether to use an index determines the minimum value and the maximum value of the key variable, and then k>oks at the selection criteria (the WHERE statement specified) to detennine approximately how many observations will be selected. The algorithm assumes that the data are evenly distributed between the minimum and maximum. If this is not the case, the algorithm may decide to retrieve the observations via the index when a sequential pass of the data would in fact be more efficient AN EXAMPLE The following example illustrates some of the factors that should be considered when deciding whether to index a SAS data set. let us look at an application using a composite index in the CMS environment The application is the generation of a report using data from the automotive industry. It prepares a report of vehides tested at specific test sites across the country. You have approximately unique values of VEHICLE, vehide identification number, that are unifonnly distributed. The values of TESTSITE, city and state, are not uniformly distributed - the value CARY,NC accounts for approximately 2/3 of the data values for TESTSITE. Your final report will list about 34 observations. In this example, a composite index was created using the variables TESTSITE and VEHICLE, in this order. Next a WHERE clause was used to subset the data using these two variables as selection criteria. WHERE TESTSITE-'value1' AND VEHICl.B"'value2'; The results from this subset of the data were an increase in the VCPU and TCPU statistics (reported by the STIMER and STATS system options). VCPU, virtual CPU time, represents the CPU time spent executing within your virtual machine. TCPU, total CPU time, represents the VCPU time plus CPU time spent executing CP systems devices on behah of your job. The TCPU statistic reflects the 110 resources. To explain this decrease in performance, you must look at several factors. First, the variable TESTSITE is not a good candidate for an index since one value represents so many of the observations of the data. The variable VEHICLE is a good candidate since its data are evenly distributed. Secondly, only the first key variable in a composite index is used for WHERE clause optimization. This means that our composite index had approximately the same effect as a simple index on the variable TESTSITE. We have already stated that TESTSITE is not a good candidate for indexing. To improve performance in this example, delete the composite index and create a simple index on the most discriminating variable. In our example, this would be VEHICLE. The SAS System will then go and retrieve the observations meeting the selection criteria for VEHICLE = via the index. Next, it will sequentially process this subset looking for the appropriate values of TESTSITE. CONCLUSION Indexes can be very effective in some situations, but can actually degrade performance in others. You should evaluate your data and application carefully before you decide to index your SAS data sets. As with any performance feature, there are advantages and disadvantages to weigh. Your decision should be based on which resources are most important to conserve in your computing environment. If disk space is at a premium, then you should consider the fact that the index or indexes take extra disk space. On the other hand, if 1/0 time is important, then you should consider creating an index to reduce time to retrieve observations from your SAS data sets. Keep in mind that you can only provide the index, not force the SAS System to use it. 410

4 REFERENCES Beatrous, Stephen and William Clifford. ~Version 6 SAS" Data Base System Architecture: Current and Future Features. ft Proceedings of the Thirteenth Annual SAS Users Group International Conference. Clifford, William D. et at "Using New SAS" Database Features and Options." Proceedings of the Fourteenth Annual SAS Users Group International Conference. 411

5 412

Stephen M. Beatrous, SAS Institute Inc., Cary, NC John T. Stokes, SAS Institute Inc., Austin, TX

Stephen M. Beatrous, SAS Institute Inc., Cary, NC John T. Stokes, SAS Institute Inc., Austin, TX 1/0 Performance Improvements in Release 6.07 of the SAS System under MVS, ems, and VMS' Stephen M. Beatrous, SAS Institute Inc., Cary, NC John T. Stokes, SAS Institute Inc., Austin, TX INTRODUCTION The

More information

An Introduction to Compressing Data Sets J. Meimei Ma, Quintiles

An Introduction to Compressing Data Sets J. Meimei Ma, Quintiles An Introduction to Compressing Data Sets J. Meimei Ma, Quintiles r:, INTRODUCTION This tutorial introduces compressed data sets. The SAS system compression algorithm is described along with basic syntax.

More information

Presented at SEUGI '92 by Colin Harris,SAS Institute

Presented at SEUGI '92 by Colin Harris,SAS Institute Presented at SEUGI '92 by Colin Harris,SAS Institute Database Features Extend The Scope of SAS/SHARE@ Software William D. Clifford, SAS Institute Inc., Austin, TX ABSTRACT The role of SAS/SHARE@ software

More information

Simple Rules to Remember When Working with Indexes

Simple Rules to Remember When Working with Indexes Simple Rules to Remember When Working with Indexes Kirk Paul Lafler, Software Intelligence Corporation, Spring Valley, CA Abstract SAS users are always interested in learning techniques related to improving

More information

Chapter 1. Introduction to Indexes

Chapter 1. Introduction to Indexes Chapter 1 Introduction to Indexes The Index Concept 2 The Index as a SAS Performance Tool 2 Types of SAS Applications That May Benefit from Indexes 4 How SAS Indexes Are Structured 4 Types of SAS Indexes

More information

Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA

Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA Indexing and Compressing SAS Data Sets: How, Why, and Why Not Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA Many users of SAS System software, especially those working

More information

Optimizing System Performance

Optimizing System Performance 243 CHAPTER 19 Optimizing System Performance Definitions 243 Collecting and Interpreting Performance Statistics 244 Using the FULLSTIMER and STIMER System Options 244 Interpreting FULLSTIMER and STIMER

More information

Chapter 12: Indexing and Hashing. Basic Concepts

Chapter 12: Indexing and Hashing. Basic Concepts Chapter 12: Indexing and Hashing! Basic Concepts! Ordered Indices! B+-Tree Index Files! B-Tree Index Files! Static Hashing! Dynamic Hashing! Comparison of Ordered Indexing and Hashing! Index Definition

More information

Hash-Based Indexing 165

Hash-Based Indexing 165 Hash-Based Indexing 165 h 1 h 0 h 1 h 0 Next = 0 000 00 64 32 8 16 000 00 64 32 8 16 A 001 01 9 25 41 73 001 01 9 25 41 73 B 010 10 10 18 34 66 010 10 10 18 34 66 C Next = 3 011 11 11 19 D 011 11 11 19

More information

Chapter 12: Query Processing. Chapter 12: Query Processing

Chapter 12: Query Processing. Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Sorting Join

More information

Chapter 11: Indexing and Hashing

Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree

More information

Chapter 12: Indexing and Hashing

Chapter 12: Indexing and Hashing Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL

More information

Chapter 11: Indexing and Hashing

Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree

More information

Database System Concepts, 6 th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use

Database System Concepts, 6 th Ed. Silberschatz, Korth and Sudarshan See  for conditions on re-use Chapter 11: Indexing and Hashing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files Static

More information

Announcements. Reading Material. Recap. Today 9/17/17. Storage (contd. from Lecture 6)

Announcements. Reading Material. Recap. Today 9/17/17. Storage (contd. from Lecture 6) CompSci 16 Intensive Computing Systems Lecture 7 Storage and Index Instructor: Sudeepa Roy Announcements HW1 deadline this week: Due on 09/21 (Thurs), 11: pm, no late days Project proposal deadline: Preliminary

More information

Why Is This Important? Overview of Storage and Indexing. Components of a Disk. Data on External Storage. Accessing a Disk Page. Records on a Disk Page

Why Is This Important? Overview of Storage and Indexing. Components of a Disk. Data on External Storage. Accessing a Disk Page. Records on a Disk Page Why Is This Important? Overview of Storage and Indexing Chapter 8 DB performance depends on time it takes to get the data from storage system and time to process Choosing the right index for faster access

More information

SYSTEM 2000 Essentials

SYSTEM 2000 Essentials 7 CHAPTER 2 SYSTEM 2000 Essentials Introduction 7 SYSTEM 2000 Software 8 SYSTEM 2000 Databases 8 Database Name 9 Labeling Data 9 Grouping Data 10 Establishing Relationships between Schema Records 10 Logical

More information

APPENDIX 3 Tuning Tips for Applications That Use SAS/SHARE Software

APPENDIX 3 Tuning Tips for Applications That Use SAS/SHARE Software 177 APPENDIX 3 Tuning Tips for Applications That Use SAS/SHARE Software Authors 178 Abstract 178 Overview 178 The SAS Data Library Model 179 How Data Flows When You Use SAS Files 179 SAS Data Files 179

More information

SAS File Management. Improving Performance CHAPTER 37

SAS File Management. Improving Performance CHAPTER 37 519 CHAPTER 37 SAS File Management Improving Performance 519 Moving SAS Files Between Operating Environments 520 Converting SAS Files 520 Repairing Damaged Files 520 Recovering SAS Data Files 521 Recovering

More information

CSIT5300: Advanced Database Systems

CSIT5300: Advanced Database Systems CSIT5300: Advanced Database Systems L08: B + -trees and Dynamic Hashing Dr. Kenneth LEUNG Department of Computer Science and Engineering The Hong Kong University of Science and Technology Hong Kong SAR,

More information

Introduction to Indexing 2. Acknowledgements: Eamonn Keogh and Chotirat Ann Ratanamahatana

Introduction to Indexing 2. Acknowledgements: Eamonn Keogh and Chotirat Ann Ratanamahatana Introduction to Indexing 2 Acknowledgements: Eamonn Keogh and Chotirat Ann Ratanamahatana Indexed Sequential Access Method We have seen that too small or too large an index (in other words too few or too

More information

Indexing. Chapter 8, 10, 11. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Indexing. Chapter 8, 10, 11. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Indexing Chapter 8, 10, 11 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Tree-Based Indexing The data entries are arranged in sorted order by search key value. A hierarchical search

More information

Chapter 17 Indexing Structures for Files and Physical Database Design

Chapter 17 Indexing Structures for Files and Physical Database Design Chapter 17 Indexing Structures for Files and Physical Database Design We assume that a file already exists with some primary organization unordered, ordered or hash. The index provides alternate ways to

More information

Storage and Indexing

Storage and Indexing CompSci 516 Data Intensive Computing Systems Lecture 5 Storage and Indexing Instructor: Sudeepa Roy Duke CS, Spring 2016 CompSci 516: Data Intensive Computing Systems 1 Announcement Homework 1 Due on Feb

More information

Chapter 12: Indexing and Hashing

Chapter 12: Indexing and Hashing Chapter 12: Indexing and Hashing Database System Concepts, 5th Ed. See www.db-book.com for conditions on re-use Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree

More information

PharmaSUG Paper BB01

PharmaSUG Paper BB01 PharmaSUG 2014 - Paper BB01 Indexing: A powerful technique for improving efficiency Arun Raj Vidhyadharan, inventiv Health, Somerset, NJ Sunil Mohan Jairath, inventiv Health, Somerset, NJ ABSTRACT The

More information

Physical Level of Databases: B+-Trees

Physical Level of Databases: B+-Trees Physical Level of Databases: B+-Trees Adnan YAZICI Computer Engineering Department METU (Fall 2005) 1 B + -Tree Index Files l Disadvantage of indexed-sequential files: performance degrades as file grows,

More information

Performance Considerations

Performance Considerations 149 CHAPTER 6 Performance Considerations Hardware Considerations 149 Windows Features that Optimize Performance 150 Under Windows NT 150 Under Windows NT Server Enterprise Edition 4.0 151 Processing SAS

More information

CPS352 Lecture - Indexing

CPS352 Lecture - Indexing Objectives: CPS352 Lecture - Indexing Last revised 2/25/2019 1. To explain motivations and conflicting goals for indexing 2. To explain different types of indexes (ordered versus hashed; clustering versus

More information

Chapter 11: Indexing and Hashing

Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL

More information

Chapter 13: Query Processing

Chapter 13: Query Processing Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing

More information

Overview of Storage and Indexing

Overview of Storage and Indexing Overview of Storage and Indexing Chapter 8 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Data on External Storage Disks: Can retrieve random page at fixed cost But reading several consecutive

More information

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and

More information

Chapter 13: Query Processing Basic Steps in Query Processing

Chapter 13: Query Processing Basic Steps in Query Processing Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and

More information

Chapter 12: Query Processing

Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Overview Chapter 12: Query Processing Measures of Query Cost Selection Operation Sorting Join

More information

Using SAS Files. Introduction CHAPTER 5

Using SAS Files. Introduction CHAPTER 5 123 CHAPTER 5 Using SAS Files Introduction 123 SAS Data Libraries 124 Accessing SAS Files 124 Advantages of Using Librefs Rather than OpenVMS Logical Names 124 Assigning Librefs 124 Using the LIBNAME Statement

More information

Overview of Storage and Indexing

Overview of Storage and Indexing Overview of Storage and Indexing Yanlei Diao UMass Amherst Feb 21, 2006 Slides Courtesy of R. Ramakrishnan and J. Gehrke 1 DBMS Architecture Query Parser Query Rewriter Query Optimizer Query Executor Lock

More information

Transform & Conquer. Presorting

Transform & Conquer. Presorting Transform & Conquer Definition Transform & Conquer is a general algorithm design technique which works in two stages. STAGE : (Transformation stage): The problem s instance is modified, more amenable to

More information

Chapter 11: Indexing and Hashing" Chapter 11: Indexing and Hashing"

Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing" Database System Concepts, 6 th Ed.! Silberschatz, Korth and Sudarshan See www.db-book.com for conditions on re-use " Chapter 11: Indexing and Hashing" Basic Concepts!

More information

Physical Database Design: Outline

Physical Database Design: Outline Physical Database Design: Outline File Organization Fixed size records Variable size records Mapping Records to Files Heap Sequentially Hashing Clustered Buffer Management Indexes (Trees and Hashing) Single-level

More information

SAS Data View and Engine Processing. Defining a SAS Data View. Advantages of SAS Data Views SAS DATA VIEWS: A VIRTUAL VIEW OF DATA

SAS Data View and Engine Processing. Defining a SAS Data View. Advantages of SAS Data Views SAS DATA VIEWS: A VIRTUAL VIEW OF DATA SAS DATA VIEWS: A VIRTUAL VIEW OF DATA John C. Boling SAS Institute Inc., Cary, NC Abstract The concept of a SAS data set has been extended or broadened in Version 6 of the SAS System. Two SAS file structures

More information

B.H.GARDI COLLEGE OF MASTER OF COMPUTER APPLICATION. Ch. 1 :- Introduction Database Management System - 1

B.H.GARDI COLLEGE OF MASTER OF COMPUTER APPLICATION. Ch. 1 :- Introduction Database Management System - 1 Basic Concepts :- 1. What is Data? Data is a collection of facts from which conclusion may be drawn. In computer science, data is anything in a form suitable for use with a computer. Data is often distinguished

More information

SAS/FSP 9.2. Procedures Guide

SAS/FSP 9.2. Procedures Guide SAS/FSP 9.2 Procedures Guide The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2008. SAS/FSP 9.2 Procedures Guide. Cary, NC: SAS Institute Inc. SAS/FSP 9.2 Procedures

More information

Ordered Indices To gain fast random access to records in a file, we can use an index structure. Each index structure is associated with a particular search key. Just like index of a book, library catalog,

More information

Chapter 11: Indexing and Hashing

Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree

More information

Indexing: B + -Tree. CS 377: Database Systems

Indexing: B + -Tree. CS 377: Database Systems Indexing: B + -Tree CS 377: Database Systems Recap: Indexes Data structures that organize records via trees or hashing Speed up search for a subset of records based on values in a certain field (search

More information

Database System Concepts

Database System Concepts Chapter 13: Query Processing s Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2008/2009 Slides (fortemente) baseados nos slides oficiais do livro c Silberschatz, Korth

More information

Intro to DB CHAPTER 12 INDEXING & HASHING

Intro to DB CHAPTER 12 INDEXING & HASHING Intro to DB CHAPTER 12 INDEXING & HASHING Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing

More information

COMP 430 Intro. to Database Systems. Indexing

COMP 430 Intro. to Database Systems. Indexing COMP 430 Intro. to Database Systems Indexing How does DB find records quickly? Various forms of indexing An index is automatically created for primary key. SQL gives us some control, so we should understand

More information

7. Query Processing and Optimization

7. Query Processing and Optimization 7. Query Processing and Optimization Processing a Query 103 Indexing for Performance Simple (individual) index B + -tree index Matching index scan vs nonmatching index scan Unique index one entry and one

More information

Single Record and Range Search

Single Record and Range Search Database Indexing 8 Single Record and Range Search Single record retrieval: Find student name whose Age = 20 Range queries: Find all students with Grade > 8.50 Sequentially scanning of file is costly If

More information

Summarizing Impossibly Large SAS Data Sets For the Data Warehouse Server Using Horizontal Summarization

Summarizing Impossibly Large SAS Data Sets For the Data Warehouse Server Using Horizontal Summarization Summarizing Impossibly Large SAS Data Sets For the Data Warehouse Server Using Horizontal Summarization Michael A. Raithel, Raithel Consulting Services Abstract Data warehouse applications thrive on pre-summarized

More information

USING PROC SQL EFFECTIVELY WITH SAS DATA SETS JIM DEFOOR LOCKHEED FORT WORTH COMPANY

USING PROC SQL EFFECTIVELY WITH SAS DATA SETS JIM DEFOOR LOCKHEED FORT WORTH COMPANY USING PROC SQL EFFECTIVELY WITH SAS DATA SETS JIM DEFOOR LOCKHEED FORT WORTH COMPANY INTRODUCTION This paper is a beginning tutorial on reading and reporting Indexed SAS Data Sets with PROC SQL. Its examples

More information

Indexing and Hashing

Indexing and Hashing C H A P T E R 1 Indexing and Hashing This chapter covers indexing techniques ranging from the most basic one to highly specialized ones. Due to the extensive use of indices in database systems, this chapter

More information

Review of Storage and Indexing

Review of Storage and Indexing Review of Storage and Indexing CMPSCI 591Q Sep 17, 2007 Slides adapted from those of R. Ramakrishnan and J. Gehrke 1 File organizations & access methods Many alternatives exist, each ideal for some situations,

More information

Indexing Methods. Lecture 9. Storage Requirements of Databases

Indexing Methods. Lecture 9. Storage Requirements of Databases Indexing Methods Lecture 9 Storage Requirements of Databases Need data to be stored permanently or persistently for long periods of time Usually too big to fit in main memory Low cost of storage per unit

More information

Overview of Storage and Indexing

Overview of Storage and Indexing Overview of Storage and Indexing Chapter 8 Instructor: Vladimir Zadorozhny vladimir@sis.pitt.edu Information Science Program School of Information Sciences, University of Pittsburgh 1 Data on External

More information

Kathleen Durant PhD Northeastern University CS Indexes

Kathleen Durant PhD Northeastern University CS Indexes Kathleen Durant PhD Northeastern University CS 3200 Indexes Outline for the day Index definition Types of indexes B+ trees ISAM Hash index Choosing indexed fields Indexes in InnoDB 2 Indexes A typical

More information

Database System Concepts, 5th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use

Database System Concepts, 5th Ed. Silberschatz, Korth and Sudarshan See   for conditions on re-use Chapter 12: Indexing and Hashing Database System Concepts, 5th Ed. See www.db-book.com for conditions on re-use Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree

More information

Overview of Storage and Indexing

Overview of Storage and Indexing Overview of Storage and Indexing Chapter 8 How index-learning turns no student pale Yet holds the eel of science by the tail. -- Alexander Pope (1688-1744) Database Management Systems 3ed, R. Ramakrishnan

More information

14 Index selection guidelines 12/08/17 11:42 PM. Index selection guidelines

14 Index selection guidelines 12/08/17 11:42 PM. Index selection guidelines Index selection guidelines 1 To use an index or not to use an index? Main principle Do not build index unless some query (including the query components of updates and deletions) benefits from it Selectivity

More information

Lecture 13. Lecture 13: B+ Tree

Lecture 13. Lecture 13: B+ Tree Lecture 13 Lecture 13: B+ Tree Lecture 13 Announcements 1. Project Part 2 extension till Friday 2. Project Part 3: B+ Tree coming out Friday 3. Poll for Nov 22nd 4. Exam Pickup: If you have questions,

More information

SAS Data Libraries. Definition CHAPTER 26

SAS Data Libraries. Definition CHAPTER 26 385 CHAPTER 26 SAS Data Libraries Definition 385 Library Engines 387 Library Names 388 Physical Names and Logical Names (Librefs) 388 Assigning Librefs 388 Associating and Clearing Logical Names (Librefs)

More information

Lecture 8 Index (B+-Tree and Hash)

Lecture 8 Index (B+-Tree and Hash) CompSci 516 Data Intensive Computing Systems Lecture 8 Index (B+-Tree and Hash) Instructor: Sudeepa Roy Duke CS, Fall 2017 CompSci 516: Database Systems 1 HW1 due tomorrow: Announcements Due on 09/21 (Thurs),

More information

Using Data Transfer Services

Using Data Transfer Services 103 CHAPTER 16 Using Data Transfer Services Introduction 103 Benefits of Data Transfer Services 103 Considerations for Using Data Transfer Services 104 Introduction For many applications, data transfer

More information

Advanced Database Systems

Advanced Database Systems Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed

More information

Overview of Storage and Indexing

Overview of Storage and Indexing Overview of Storage and Indexing Chapter 8 How index-learning turns no student pale Yet holds the eel of science by the tail. -- Alexander Pope (1688-1744) Database Management Systems 3ed, R. Ramakrishnan

More information

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Query Processing Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Slides re-used with some modification from www.db-book.com Reference: Database System Concepts, 6 th Ed. By Silberschatz,

More information

SOS (Save Our Space) Matters of Size

SOS (Save Our Space) Matters of Size SOS (Save Our Space) Matters of Size By Matthew Pearce Amadeus Software Limited 2001 Abstract Disk space is one of the most critical issues when handling large amounts of data. Large data means greater

More information

PROC FORMAT: USE OF THE CNTLIN OPTION FOR EFFICIENT PROGRAMMING

PROC FORMAT: USE OF THE CNTLIN OPTION FOR EFFICIENT PROGRAMMING PROC FORMAT: USE OF THE CNTLIN OPTION FOR EFFICIENT PROGRAMMING Karuna Nerurkar and Andrea Robertson, GMIS Inc. ABSTRACT Proc Format can be a useful tool for improving programming efficiency. This paper

More information

CMSC424: Database Design. Instructor: Amol Deshpande

CMSC424: Database Design. Instructor: Amol Deshpande CMSC424: Database Design Instructor: Amol Deshpande amol@cs.umd.edu Databases Data Models Conceptual representa1on of the data Data Retrieval How to ask ques1ons of the database How to answer those ques1ons

More information

THE B+ TREE INDEX. CS 564- Spring ACKs: Jignesh Patel, AnHai Doan

THE B+ TREE INDEX. CS 564- Spring ACKs: Jignesh Patel, AnHai Doan THE B+ TREE INDEX CS 564- Spring 2018 ACKs: Jignesh Patel, AnHai Doan WHAT IS THIS LECTURE ABOUT? The B+ tree index Basics Search/Insertion/Deletion Design & Cost 2 INDEX RECAP We have the following query:

More information

FSEDIT Procedure Windows

FSEDIT Procedure Windows 25 CHAPTER 4 FSEDIT Procedure Windows Overview 26 Viewing and Editing Observations 26 How the Control Level Affects Editing 27 Scrolling 28 Adding Observations 28 Entering and Editing Variable Values 28

More information

Indexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel

Indexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel Indexing Week 14, Spring 2005 Edited by M. Naci Akkøk, 5.3.2004, 3.3.2005 Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel Overview Conventional indexes B-trees Hashing schemes

More information

TOP 10 (OR MORE) WAYS TO OPTIMIZE YOUR SAS CODE

TOP 10 (OR MORE) WAYS TO OPTIMIZE YOUR SAS CODE TOP 10 (OR MORE) WAYS TO OPTIMIZE YOUR SAS CODE Handy Tips for the Savvy Programmer SAS PROGRAMMING BEST PRACTICES Create Readable Code Basic Coding Recommendations» Efficiently choosing data for processing»

More information

Database Systems External Sorting and Query Optimization. A.R. Hurson 323 CS Building

Database Systems External Sorting and Query Optimization. A.R. Hurson 323 CS Building External Sorting and Query Optimization A.R. Hurson 323 CS Building External sorting When data to be sorted cannot fit into available main memory, external sorting algorithm must be applied. Naturally,

More information

Chapter 12: Query Processing

Chapter 12: Query Processing Chapter 12: Query Processing Overview Catalog Information for Cost Estimation $ Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Transformation

More information

SAS Performance Tuning Strategies and Techniques

SAS Performance Tuning Strategies and Techniques SAS Performance Tuning Strategies and Techniques Kirk Paul Lafler, Software Intelligence Corporation, Spring Valley, CA ABSTRACT As SAS Software becomes increasingly more popular, guidelines for its efficient

More information

Lecture 12. Lecture 12: Access Methods

Lecture 12. Lecture 12: Access Methods Lecture 12 Lecture 12: Access Methods Lecture 12 If you don t find it in the index, look very carefully through the entire catalog - Sears, Roebuck and Co., Consumers Guide, 1897 2 Lecture 12 > Section

More information

Data on External Storage

Data on External Storage Advanced Topics in DBMS Ch-1: Overview of Storage and Indexing By Syed khutubddin Ahmed Assistant Professor Dept. of MCA Reva Institute of Technology & mgmt. Data on External Storage Prg1 Prg2 Prg3 DBMS

More information

Paper CT-16 Manage Hierarchical or Associated Data with the RETAIN Statement Alan R. Mann, Independent Consultant, Harpers Ferry, WV

Paper CT-16 Manage Hierarchical or Associated Data with the RETAIN Statement Alan R. Mann, Independent Consultant, Harpers Ferry, WV Paper CT-16 Manage Hierarchical or Associated Data with the RETAIN Statement Alan R. Mann, Independent Consultant, Harpers Ferry, WV ABSTRACT For most of the history of computing machinery, hierarchical

More information

Using Data Set Options in PROC SQL Kenneth W. Borowiak Howard M. Proskin & Associates, Inc., Rochester, NY

Using Data Set Options in PROC SQL Kenneth W. Borowiak Howard M. Proskin & Associates, Inc., Rochester, NY Using Data Set Options in PROC SQL Kenneth W. Borowiak Howard M. Proskin & Associates, Inc., Rochester, NY ABSTRACT Data set options are an often over-looked feature when querying and manipulating SAS

More information

SAS Scalable Performance Data Server 4.3

SAS Scalable Performance Data Server 4.3 Scalability Solution for SAS Dynamic Cluster Tables A SAS White Paper Table of Contents Introduction...1 Cluster Tables... 1 Dynamic Cluster Table Loading Benefits... 2 Commands for Creating and Undoing

More information

In-Memory Searching. Linear Search. Binary Search. Binary Search Tree. k-d Tree. Hashing. Hash Collisions. Collision Strategies.

In-Memory Searching. Linear Search. Binary Search. Binary Search Tree. k-d Tree. Hashing. Hash Collisions. Collision Strategies. In-Memory Searching Linear Search Binary Search Binary Search Tree k-d Tree Hashing Hash Collisions Collision Strategies Chapter 4 Searching A second fundamental operation in Computer Science We review

More information

Chapter 18 Indexing Structures for Files. Indexes as Access Paths

Chapter 18 Indexing Structures for Files. Indexes as Access Paths Chapter 18 Indexing Structures for Files Indexes as Access Paths A single-level index is an auxiliary file that makes it more efficient to search for a record in the data file. The index is usually specified

More information

Overview of Storage and Indexing. Data on External Storage

Overview of Storage and Indexing. Data on External Storage Overview of Storage and Indexing Chapter 8 How index-learning turns no student pale Yet holds the eel of science by the tail. -- Alexander Pope (1688-1744) Database Management Systems 3ed, R. Ramakrishnanand

More information

CHAPTER 7 Using Other SAS Software Products

CHAPTER 7 Using Other SAS Software Products 77 CHAPTER 7 Using Other SAS Software Products Introduction 77 Using SAS DATA Step Features in SCL 78 Statements 78 Functions 79 Variables 79 Numeric Variables 79 Character Variables 79 Expressions 80

More information

INDEXES MICHAEL LIUT DEPARTMENT OF COMPUTING AND SOFTWARE MCMASTER UNIVERSITY

INDEXES MICHAEL LIUT DEPARTMENT OF COMPUTING AND SOFTWARE MCMASTER UNIVERSITY INDEXES MICHAEL LIUT (LIUTM@MCMASTER.CA) DEPARTMENT OF COMPUTING AND SOFTWARE MCMASTER UNIVERSITY SE 3DB3 (Slides adapted from Dr. Fei Chiang) Fall 2016 An Index 2 Data structure that organizes records

More information

CSE 530A. B+ Trees. Washington University Fall 2013

CSE 530A. B+ Trees. Washington University Fall 2013 CSE 530A B+ Trees Washington University Fall 2013 B Trees A B tree is an ordered (non-binary) tree where the internal nodes can have a varying number of child nodes (within some range) B Trees When a key

More information

CSC 261/461 Database Systems Lecture 17. Fall 2017

CSC 261/461 Database Systems Lecture 17. Fall 2017 CSC 261/461 Database Systems Lecture 17 Fall 2017 Announcement Quiz 6 Due: Tonight at 11:59 pm Project 1 Milepost 3 Due: Nov 10 Project 2 Part 2 (Optional) Due: Nov 15 The IO Model & External Sorting Today

More information

A TALE OF TWO RELEASES: BENCHMARKING THE PERFORMANCE OF THE SAS SYSTEM RELEASE 6.06 AGAINST RELEASE 5.18

A TALE OF TWO RELEASES: BENCHMARKING THE PERFORMANCE OF THE SAS SYSTEM RELEASE 6.06 AGAINST RELEASE 5.18 A TALE OF TWO RELEASES: BENCHMARKING THE PERFORMANCE OF THE SAS SYSTEM RELEASE 6.06 AGAINST RELEASE 5.18 Michael A. Raithel, Marriott Corporation INTRODUCTION Last year, at SUGI 15, the SAS Institute formally

More information

File Organization and Storage Structures

File Organization and Storage Structures File Organization and Storage Structures o Storage of data File Organization and Storage Structures Primary Storage = Main Memory Fast Volatile Expensive Secondary Storage = Files in disks or tapes Non-Volatile

More information

SASe vs DB2 as a Relational DBMS for End Users: Three Corporations with Three Different Solutions Stephen C. Scott, Scott Consulting Services, Inc.

SASe vs DB2 as a Relational DBMS for End Users: Three Corporations with Three Different Solutions Stephen C. Scott, Scott Consulting Services, Inc. SASe vs DB2 as a Relational DBMS for End Users: Three Corporations with Three Different Solutions Stephen C. Scott, Scott Consulting Services, nc. ABSTRACT: Three corporations with different sizes and

More information

Locking SAS Data Objects

Locking SAS Data Objects 59 CHAPTER 5 Locking SAS Data Objects Introduction 59 Audience 60 About the SAS Data Hierarchy and Locking 60 The SAS Data Hierarchy 60 How SAS Data Objects Are Accessed and Used 61 Types of Locks 62 Locking

More information

DATABASE PERFORMANCE AND INDEXES. CS121: Relational Databases Fall 2017 Lecture 11

DATABASE PERFORMANCE AND INDEXES. CS121: Relational Databases Fall 2017 Lecture 11 DATABASE PERFORMANCE AND INDEXES CS121: Relational Databases Fall 2017 Lecture 11 Database Performance 2 Many situations where query performance needs to be improved e.g. as data size grows, query performance

More information

Ten tips for efficient SAS code

Ten tips for efficient SAS code Ten tips for efficient SAS code Host Caroline Scottow Presenter Peter Hobart Managing the webinar In Listen Mode Control bar opened with the white arrow in the orange box Efficiency Overview Optimisation

More information

Announcements. Reading Material. Today. Different File Organizations. Selection of Indexes 9/24/17. CompSci 516: Database Systems

Announcements. Reading Material. Today. Different File Organizations. Selection of Indexes 9/24/17. CompSci 516: Database Systems CompSci 516 Database Systems Lecture 9 Index Selection and External Sorting Announcements Private project threads created on piazza Please use these threads (and not emails) for all communications on your

More information

File Structures and Indexing

File Structures and Indexing File Structures and Indexing CPS352: Database Systems Simon Miner Gordon College Last Revised: 10/11/12 Agenda Check-in Database File Structures Indexing Database Design Tips Check-in Database File Structures

More information

Paper DB2 table. For a simple read of a table, SQL and DATA step operate with similar efficiency.

Paper DB2 table. For a simple read of a table, SQL and DATA step operate with similar efficiency. Paper 76-28 Comparative Efficiency of SQL and Base Code When Reading from Database Tables and Existing Data Sets Steven Feder, Federal Reserve Board, Washington, D.C. ABSTRACT In this paper we compare

More information

EMC GREENPLUM MANAGEMENT ENABLED BY AGINITY WORKBENCH

EMC GREENPLUM MANAGEMENT ENABLED BY AGINITY WORKBENCH White Paper EMC GREENPLUM MANAGEMENT ENABLED BY AGINITY WORKBENCH A Detailed Review EMC SOLUTIONS GROUP Abstract This white paper discusses the features, benefits, and use of Aginity Workbench for EMC

More information