Will Your Data Warehouse Stand the Test of rime? David Annis, Amadeus Data Processing, Germany

Size: px
Start display at page:

Download "Will Your Data Warehouse Stand the Test of rime? David Annis, Amadeus Data Processing, Germany"

Transcription

1 Will Your Data Warehouse Stand the Test of rime? David Annis, Amadeus Data Processing, Germany As storage becomes cheaper, we have to be more careful rather than less careful about how we design our historical databases. Whatever the style or architecture of your data, and whether or not it conforms to the standards for a data warehouse, we all benefit from the increased awareness amongst users and managers of 'Data Warehousing' as a concept. because as the demand for information delivery becomes greater, there is a good chance that you will be given enough disk space to keep several years' worth of data on-line, thus avoiding the administrative overhead of archiving and restoring old information. With this benefit, however, comes an added responsibility to design your databases, and the update procedures in such a way that they will still be feasible in several years' time. In this paper,l will examine two aspects of dealing with large amounts of historical data that will probably be common to all business areas: 1. The nightly update - getting yesterday's information into the warehouse. 2. Coding systems - storing and accessing coded information. Amadeus Amadeus is a global computerised reservation system that provides centralised access to travel services including airlines, cars and hotels. Within the Capacity and Performance Department, we have a responsibility to ensure that the systems and network resources are provided at optimum value for money to the customer. As a result, we also have to ensure that our own data analysis procedures do not use resources that would be better allocated to the main production systems. The nightly update The common objectives To format yesterday's business information for inclusion into the database. Ensure no duplication Set up any indexes that are required Assumptions When adding yesterday's data into your database, the objective is make the process as automated and secure as possible. Assuming that all the procedures involved are not perfect, you will need to consider the possibility that the job will have to be re-run, and any data that was added would have to be replaced. You also need to design the database so that frequent queries have a good response time. Typically this means indexing. It is probably safe to assume (or even enforce) that if a user makes a query against a historical database, he/she will be interested in a specific time period. As an example, consider the work flow involved in updating the amadeus network performance database. 919

2 Example Work flow Network Statistics: Network Statistics: Network Statistics: Network Statistics: Resource: SNA Resource: IDG Resource: BBN Resource: HYP Create common Format in SAS Datasets Combine J r- I-- FAdd to ~ ( Database "./ This represents a fairly common type of work flow: different sources of business information being combined and perhaps summarised before being added into the main database(s}. In our.case, all the automated checks for data validity cannot guarantee that a quick read of the daily report will not show that the process needs to be restarted from an earlier step. As a result, the process of adding to the database needs to ensure that yesterday's data is not simply duplicated. Example Methods The following example methods are extracts of code which could be used to add a daily dataset (TODAY) to a historical dataset (MAIN. CUSTOMER). 920

3 Method 1: Using SET. %let date=today()-l; data MAIN.CUSTOMER; set MAIN.CUSTOMER(where=(date ne &date)) TODAY(where=(date eq &date)); Consequences If any previous data from &DATE was added, it will be replaced. If the newly created data TODAY contains any data from another day, this will not be added. Method 2: Using MERGE. The example assumes one observation per customer per day. data MAIN.CUSTOMER; merge MAIN.CUSTOMER TODAY; by date customer; Consequences If the process is re-run, and the number of customers is different on the second run, some observations from the first run will not be replaced. Disadvantage of Methods 1 and 2. Although the consequences of methods 1 and 2 can be avoided with a little extra code, I am not going to expand on them as they both have one major disadvantage: both methods perform a sequential read of the main dataset. This means that the resources used by the nightly update would grow in a linear way over time. Method 3: Using a unique index. data TODAY(INDEX=(key=(date customer) /UNIQUE)); <more statements> proc append base=main.customer new=today; Consequences You could describe this method as an 'intelligent' version of Method 2 above. Note that in this method also, if the process is re-run, and the number of customers is different on the second run, some observations from the first run will not be replaced. This could be rectified using the MODIFY statement, and removing the observations in place, but this is only advisable if your database is backed up, as a system error in the middle of the data step could destroy your data. This method does have the distinct advantage, however, that the data will be indexed in a useful way. It is useful because almost all queries against a historical database will be subsetting by the date in some way or other. 921

4 f'=-~"="=="-=~ ~~ The other consequences of this method are overheads in disk space and run-time resources. This will be discussed in a later section. Method 4: Using a Journal When designing the amadeus Network Performance Database, the major objective was the automation of the daily update, and ensuring that re-runs and exceptions could be handled simply. As a result, the first approach was to maintain a dataset containing all the dates where an update was performed. In other words, a simple journal dataset. This had the following benefits: Easy to avoid duplication of data. The update can be designed to run as many days as necessary to bring the database up to date. On further examination, it became obvious that this journal dataset could also hold pointers (observation numbers) to the start and end of each day's data in the main database. The number of observations is also held in the journal dataset as a check, and the final structure is presented below. -,68' 'EoSs 'NOSS 'Journal Dataset Main Dataset jun94 I :- 12JUn94:23:45jX L I: r~oo:15 i~l.. j12jun94:23:45ix99999 ~I! This data structure is maintained by the macro %JAPPEND. This in turn uses two further macros which are described briefly below: %nobs finds the number of observations from a dataset and puts the result into a global macro variable. %jgetdate finds the date from a dataset. This works by searching from the middle of the dataset for the first observation with a non-missing date value. The macro is designed in this way mainly for performance data, where the records are likely to be sorted by time, with some data from just before or just after midnight at the beginning or end of the dataset. Note that in our example, this data is discarded. For the sake of readability, I have removed the error handling code from the example overleaf. 922

5 * Macro JAPPEND is intended to append a days worth of data to a history database. The following conditions are required: 1. BASE and NEW datasets must have the same variables 'etc as the append step does not use the FORCE option. 2. A journal dataset must exist with the following variables: DATE, SOBS, EOBS, NOBS.. 3. The NEW dataset must contain a date or datetime variable. EXAMPLE CALL: %jappend(base=base,new=new,dateval=%str(datepart(dt»,journal=journal) ; %macro jappend(base=,new=,journal=,dateval=); %jgetdate (data=&new,dateval=&dateval, result=newdate) %if &newdate=. %then... ERROR... *********************************************************************; * Check in journal for same date : * The delete is done after the append using the _DELETE_ dataset and the DELOBS macrovar.; *********************************************************************; data delete; set &journal; where date=&newdate; %nobs(data=_delete_,macvar=delobs); ************************************************************** * Find the number of observations before and after the append **************************************************************; %nobs(data=&base,macvar=baseobs); proc append base=&base new=&new(where=(&dateval=&newdate»; %nobs(data=&base,macvar=newobs); data newobs; - - date=&newdate; sobs=&baseobs+1; eobs=&newobs; nobs=&newobs-&baseobs; if nobs=q then... ERROR... * If date already on the database, dele~e the data in place *********; %if &delobs>q %then %do; data &base; set delete; - - do end; obno =sobs to eobs; -modify &base point=_obno ;. remove; **** Mark obs as deleted in journal before adding new entry; data &journal; set &journal; if date=&newdate then date=.; %end; **** Add new journal entry ******************************************; proc append force base=&journal new=_newobs_; %mend jappend; 923

6 Queries Using a Journal Of course one disadvantage of developing a non-standard indexing scheme is that the index is not automatically used. The following macro, however, does provide a simple way for making use of the index. Note that I am presenting here a simplified version. The production version of this macro also contains code for the creation of DATASTEP VIEWS, and for the optimisation of subsetting by variables other than the date. The optimisation is achieved by performing a binary search, as it is known that the daily data is sorted before being JAPPENDed to the database. %macro JGET(data=,out=,journal=,dateval=,sdate=,edate=); data &out(drop=sobs eobs nobs); set &journal(where=(&dateval between &sdate and &edate)); do end; obs =sobs to eobs; set &data point=_obs_ output; %mend JGET; Resources One of the main reasons for developing the Journal Index method described above was the intuition that this would save considerable resources when compared to a unique index. Therefore we created some simulations to compare the disk utilisation and run-time utilisation of the two methods. The figures given below are intended as a qualitative comparison of the two methods. The intention is to simulate how the methods would behave over a long period of time. Therefore I have tended to simplify the graphs and avoid using too many figures, as the emphasis is to show the relative trend of the resources used rather than the actual numerical details. Disk Space As you might expect, tests indicated that, when using a unique index, the size of the index file was dependant only on the number of observations, and the number of indexes. As a result, the overhead in terms of percent decreases as the number of non-indexed variables increases. A graph showing this relationship is shown below: all variables are 8 bytes, there are 10,000 observations in the sample dataset. Note that using a non-unique index, another influencing factor is the number of observations per group; but with 10,000 observations, the percentage of overhead was similar to the unique index for all levels of grouping. 924

7 .. ~isk overhead: Unique Index on Two 8 byte variable) 100% 90% 80% ~ r\ C e a. 70% 60%.!E 50% '1:1.. as.r. > 40% 0 30% 20% 10't'fj 0% :\ -...!!,.. - \ f- \ r I~ r ~ f' ~ I- r r r--! r-- []~~~- I,! I I I I I I _1 ~ ~ ~ ~..1 ~ ~ _ No. of Other Variable. By comparison, the disk space overhead for the Journal Index is virtually zero, so the graph effectively shows the percentage savings of the Journal index oyer a standard SAS index. Execution Resources In order to compare the execution resource overhead of using a unique index, we ran some simulations in which we created a daily dataset, containing a date plus one other numeric variable, which both went to.. make up the unique index. We appended this repeatedly to a main dataset, changing the date each time. In the first test, we appended 10,000 observations per day, and repeated this 100 times, so that the main dataset contained 1 million observations by the end of the test. For each iteration, we measured the CPU and 10 utilisation of the append step. The objective was to show whether the resources consumed by the append increased with the size of the base dataset. As a comparison, we did the same test using a unique index, and using a journal. A representation of the results of the CPU measurements is shown overleaf. 925

8 ~PU Utilisation Comparison Append Step Only 1.4, ' := t1 ~ I: o c7i is Journal... Unique L--...L...;...II-- -'-- -.l. ---I o Years In this representation I have extrapolated the results linearly to show the relative performance over a 4 year period. One surprising thing to note, is that the CPU resources used by the unique index are lower until about year 3 in our example (about 11 million observations). This is in fact NOT due to the overhead of creating the journal, as this is negligible: PROC APPEND itself appears to use less CPU with a unique index than with no index at all. Adding more non-indexed variables into the example has the effect of moving the baseline for both techniques, but the convergence point remains the same. In the same example, adding in the resources used by the creation step gives the following results. CPU Utilisation Comparison Create and Append Steps Corrtlined 1.6, : ,-----'---'------\----'-...,..., 1.4 ~ o ~ 1.2 ~ u... Journal... Unique 0.8 '-- L ~ L '-- ---J o 2 Years 3 4 In this case the convergence point is after 1.7 years (6.2 Million observations). 926

9 And finally, the 10 utilisation patterns. 10 Utilisation Q>mparison O'eate and Append Steps Corrbined 300, , "2 o '* ~ ~-----~-----~-----~-----~ o 4 Years In the case of 10, the unique index consumed more 10 resources from day 1, and grew slowly over time. The growth rate in our example extrapolated to approximately 25% over three years. Resources Used by Non-Unique Indexes Note that I have not shown any performance comparisons for non-unique indexes because they can be directly compared to one of the other two methods: Disk space: comparable to a unique index 10 Resources: comparable toa :unique index CPU resources: comparable to the unique index in the creation step, and to a journal index in the append step. Conclusions For theamadi=us Network Performance Database, the journal index method has distinct advantages for the following r~asons: The number of variables in the database is small, so the overheads of an index would be high in percentage terms. I.. The number of key variables required t6 make the index unique would be high. The database is most frequently used for regular reporting of ail network resources within a given date period, so having the database indexed by date only is sufficienrmost of the time. For ad-h09 queries, the date is always used as a subsetting factor" and this makes the response time acceptable in most cases. Within a day's worth of data, the dataset is sorted by resource, allowing other optimisations within the query process (which I have not described here). If; however, your data is not suited to this form of index, it should cert~inly be possible to achieve a low rate \ of growth for the update and query;:-rocess by intelligent use of SAS indexes, or by direct access with the '" POINT = dataset option. 927

10 Coding Systems and Lookup Tables Most of you have probably experienced some of the problems and frustrations that beset knowledge workers who are trying to maintain or query data going a long way back in time. One cause of this might be due to coding systems that have changed over time. As a designer of a data warehouse or historical database, it is wise to consider some additions to Murphy's law when applied to coding systems. Obsolete codes will be re-used to save introducing a new coding system. When the new coding system is introduced, no mapping will be possible to or from the old system. The date that they changed the coding system depends on whom you ask. The new codes look the same as the old codes; they just mean something different. Some of these problems are not within the scope of this paper, and possibly not even within the capabilities of the author to answer. However, some can be alleviated by using date sensitive (or even time sensitive) lookups. This can be achieved using formats. Example: Mapping Network Resources to Customer Within amadeus, network resources are frequently reviewed and modified to provide optimum bandwidth to a customer. In order to produce management reports to show the utilisation of the network by customer, we ideally need to map the line identifier to the customer for any 15 minute interval. Considering that we normally learn of changes after the event, if we were to store the customer as part of the database, we would have to re-process the daily data whenever a change was made. Therefore we maintain a table that looks as follows: Resource Type Resource 10 Effective Date Customer NPSI X JUL94:00:00 Luftahansa NPSI X NOV94:15:15 Air France... NPSI X SEP94:00:45 Iberia,... NPSI X NOV94:15:15 Lufthansa NPSI X MAR95:12:00 SAS NPSI X NOV94:15:15 Iberia NPSI X CT94:18:00 Air France NPSI X NOV94:15:t5 SAS This table is then processed to create a dataset for processing by PROC FORMAT using the CNTLIN option. In this example, a character format called NETUSER is created. 928

11 .~~,---~ ~ data formats; length start end $40 label $40; keep fmtname type start end label eexcl sexcl hlo; set data.netspeed end=last; by restype resid; /* Create enddate from next observation */ if last. resid then enddate= ; else do; next= n +1; set data.netspeed(keep=effdate rename=(effdate=enddate» point=next; end; /* Join the code and the date to make the lookup */ start=restype I I resid I I put (effdate, z12. ) ; end=restypel Iresidl Iput(enddate,z12.); /* Other variables used by PROC FORMAT */ fmtname='netuser'; type='c' ; eexcl='y'; sexcl='n'; /* Include start, exclude end */ label=user; output; /* Default label */ if last then do start='**other**'; end='**other**'; hlo='o' ; label=' '; output; end; return; Once the format is compiled, the statement to find the customer based on the resource and the datetime becomes very simple: I user=put (restype II residllput (datetime, z12.), $ne.tuser.) ; One important thing to note is that the length of the codes (in our case RESTYPE and RESID) should have the same format and length in the lookup table and in the database. Advantages By making your lookup tables date sens~lve, itispossible to avoidrhany of the problems of changing coding systems. At the design l)tage, it is safest to assume that all codes and coding systems may change within the lifetime of your data warehouse. Of course, the length of the code may also Change, so it is wise to leave some free space within t~e variable to account for this.. If (or when!) codes change, simplyad~ these codes into your lookup table with a new effective date, run the format compile as shown above, and the change is reflected automati ally from the new date, :with no change necessary in any programs. ' ;..:1 929

12 Summary As a data warehouse or performance database designer two years ago, you were probably restricted in disk space to about two year's worth of data. Now it appears that the demand for historical information is growing at the same pace as the data itself. As designers today, therefore, it is olir responsibility to ensure that the processes for updating and retrieving the information will not be the limiting factor in supplying this demand. In short, we have to ensure that our data warehouses will stand the test of Ume. BAS is a registered trademark of SAS Institute Inc., Cary, NC, USA Acknowledements: The idea for the journal index was inspired by a conversation with my colleague Sean Chaffee. \ 930

SAS Scalable Performance Data Server 4.3

SAS Scalable Performance Data Server 4.3 Scalability Solution for SAS Dynamic Cluster Tables A SAS White Paper Table of Contents Introduction...1 Cluster Tables... 1 Dynamic Cluster Table Loading Benefits... 2 Commands for Creating and Undoing

More information

Merge Processing and Alternate Table Lookup Techniques Prepared by

Merge Processing and Alternate Table Lookup Techniques Prepared by Merge Processing and Alternate Table Lookup Techniques Prepared by The syntax for data step merging is as follows: International SAS Training and Consulting This assumes that the incoming data sets are

More information

PROC FORMAT: USE OF THE CNTLIN OPTION FOR EFFICIENT PROGRAMMING

PROC FORMAT: USE OF THE CNTLIN OPTION FOR EFFICIENT PROGRAMMING PROC FORMAT: USE OF THE CNTLIN OPTION FOR EFFICIENT PROGRAMMING Karuna Nerurkar and Andrea Robertson, GMIS Inc. ABSTRACT Proc Format can be a useful tool for improving programming efficiency. This paper

More information

Single-pass restore after a media failure. Caetano Sauer, Goetz Graefe, Theo Härder

Single-pass restore after a media failure. Caetano Sauer, Goetz Graefe, Theo Härder Single-pass restore after a media failure Caetano Sauer, Goetz Graefe, Theo Härder 20% of drives fail after 4 years High failure rate on first year (factory defects) Expectation of 50% for 6 years https://www.backblaze.com/blog/how-long-do-disk-drives-last/

More information

Gary L. Katsanis, Blue Cross and Blue Shield of the Rochester Area, Rochester, NY

Gary L. Katsanis, Blue Cross and Blue Shield of the Rochester Area, Rochester, NY Table Lookups in the SAS Data Step Gary L. Katsanis, Blue Cross and Blue Shield of the Rochester Area, Rochester, NY Introduction - What is a Table Lookup? You have a sales file with one observation for

More information

SAS System Powers Web Measurement Solution at U S WEST

SAS System Powers Web Measurement Solution at U S WEST SAS System Powers Web Measurement Solution at U S WEST Bob Romero, U S WEST Communications, Technical Expert - SAS and Data Analysis Dale Hamilton, U S WEST Communications, Capacity Provisioning Process

More information

!! What is virtual memory and when is it useful? !! What is demand paging? !! When should pages in memory be replaced?

!! What is virtual memory and when is it useful? !! What is demand paging? !! When should pages in memory be replaced? Chapter 10: Virtual Memory Questions? CSCI [4 6] 730 Operating Systems Virtual Memory!! What is virtual memory and when is it useful?!! What is demand paging?!! When should pages in memory be replaced?!!

More information

50 WAYS TO MERGE YOUR DATA INSTALLMENT 1 Kristie Schuster, LabOne, Inc., Lenexa, Kansas Lori Sipe, LabOne, Inc., Lenexa, Kansas

50 WAYS TO MERGE YOUR DATA INSTALLMENT 1 Kristie Schuster, LabOne, Inc., Lenexa, Kansas Lori Sipe, LabOne, Inc., Lenexa, Kansas Paper 103-26 50 WAYS TO MERGE YOUR DATA INSTALLMENT 1 Kristie Schuster, LabOne, Inc., Lenexa, Kansas Lori Sipe, LabOne, Inc., Lenexa, Kansas ABSTRACT When you need to join together two datasets, how do

More information

capabilities and their overheads are therefore different.

capabilities and their overheads are therefore different. Applications Development 3 Access DB2 Tables Using Keylist Extraction Berwick Chan, Kaiser Permanente, Oakland, Calif Raymond Wan, Raymond Wan Associate Inc., Oakland, Calif Introduction The performance

More information

CSIT5300: Advanced Database Systems

CSIT5300: Advanced Database Systems CSIT5300: Advanced Database Systems L08: B + -trees and Dynamic Hashing Dr. Kenneth LEUNG Department of Computer Science and Engineering The Hong Kong University of Science and Technology Hong Kong SAR,

More information

Introduction to Indexing 2. Acknowledgements: Eamonn Keogh and Chotirat Ann Ratanamahatana

Introduction to Indexing 2. Acknowledgements: Eamonn Keogh and Chotirat Ann Ratanamahatana Introduction to Indexing 2 Acknowledgements: Eamonn Keogh and Chotirat Ann Ratanamahatana Indexed Sequential Access Method We have seen that too small or too large an index (in other words too few or too

More information

ELTMaestro for Spark: Data integration on clusters

ELTMaestro for Spark: Data integration on clusters Introduction Spark represents an important milestone in the effort to make computing on clusters practical and generally available. Hadoop / MapReduce, introduced the early 2000s, allows clusters to be

More information

Optimizing Testing Performance With Data Validation Option

Optimizing Testing Performance With Data Validation Option Optimizing Testing Performance With Data Validation Option 1993-2016 Informatica LLC. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording

More information

Paper SAS Managing Large Data with SAS Dynamic Cluster Table Transactions Guy Simpson, SAS Institute Inc., Cary, NC

Paper SAS Managing Large Data with SAS Dynamic Cluster Table Transactions Guy Simpson, SAS Institute Inc., Cary, NC Paper SAS255-2014 Managing Large Data with SAS Dynamic Cluster Table Transactions Guy Simpson, SAS Institute Inc., Cary, NC ABSTRACT Today's business needs require 24/7 access to your data in order to

More information

File System Interface: Overview. Objective. File Concept UNIT-IV FILE SYSTEMS

File System Interface: Overview. Objective. File Concept UNIT-IV FILE SYSTEMS UNIT-IV FILE SYSTEMS File System Interface: File Concept Access Methods Directory Structure File System Mounting Protection Overview For most users, the file system is the most visible aspect of an operating

More information

SAS IT Resource Management Forecasting. Setup Specification Document. A SAS White Paper

SAS IT Resource Management Forecasting. Setup Specification Document. A SAS White Paper SAS IT Resource Management Forecasting Setup Specification Document A SAS White Paper Table of Contents Introduction to SAS IT Resource Management Forecasting... 1 Getting Started with the SAS Enterprise

More information

Making do with less: Emulating Dev/Test/Prod and Creating User Playpens in SAS Data Integration Studio and SAS Enterprise Guide

Making do with less: Emulating Dev/Test/Prod and Creating User Playpens in SAS Data Integration Studio and SAS Enterprise Guide Paper 419 2013 Making do with less: Emulating Dev/Test/Prod and Creating User Playpens in SAS Data Integration Studio and SAS Enterprise Guide David Kratz, d-wise Technologies ABSTRACT Have you ever required

More information

Building a Data Warehouse with SAS Software in the Unix Environment

Building a Data Warehouse with SAS Software in the Unix Environment Building a Data Warehouse with SAS Software in the Unix Environment Karen Grippo, Dun & Bradstreet, Basking Ridge, NJ John Chen, Dun & Bradstreet, Basking Ridge, NJ Lisa Brown, SAS Institute Inc., Cary,

More information

1. Join with PROC SQL a left join that will retain target records having no lookup match. 2. Data Step Merge of the target and lookup files.

1. Join with PROC SQL a left join that will retain target records having no lookup match. 2. Data Step Merge of the target and lookup files. Abstract PaperA03-2007 Table Lookups...You Want Performance? Rob Rohrbough, Rohrbough Systems Design, Inc. Presented to the Midwest SAS Users Group Monday, October 29, 2007 Paper Number A3 Over the years

More information

Graph Structure Over Time

Graph Structure Over Time Graph Structure Over Time Observing how time alters the structure of the IEEE data set Priti Kumar Computer Science Rensselaer Polytechnic Institute Troy, NY Kumarp3@rpi.edu Abstract This paper examines

More information

Effects of PROC EXPAND Data Interpolation on Time Series Modeling When the Data are Volatile or Complex

Effects of PROC EXPAND Data Interpolation on Time Series Modeling When the Data are Volatile or Complex Effects of PROC EXPAND Data Interpolation on Time Series Modeling When the Data are Volatile or Complex Keiko I. Powers, Ph.D., J. D. Power and Associates, Westlake Village, CA ABSTRACT Discrete time series

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

How to Incorporate Old SAS Data into a New DATA Step, or What is S-M-U?

How to Incorporate Old SAS Data into a New DATA Step, or What is S-M-U? Paper 54-25 How to Incorporate Old SAS Data into a New DATA Step, or What is S-M-U? Andrew T. Kuligowski Nielsen Media Research Abstract / Introduction S-M-U. Some people will see these three letters and

More information

10 Things to expect from a DB2 Cloning Tool

10 Things to expect from a DB2 Cloning Tool 10 Things to expect from a DB2 Cloning Tool This document gives a brief overview of functionalities that can be expected from a modern DB2 cloning tool. The requirement to copy DB2 data becomes more and

More information

Comparison of different ways using table lookups on huge tables

Comparison of different ways using table lookups on huge tables PhUSE 007 Paper CS0 Comparison of different ways using table lookups on huge tables Ralf Minkenberg, Boehringer Ingelheim Pharma GmbH & Co. KG, Ingelheim, Germany ABSTRACT In many application areas the

More information

Abstract. Background. Summary of method. Using SAS to determine file and space usage in UNIX. Title: Mike Montgomery [MIS Manager, MTN (South Africa)]

Abstract. Background. Summary of method. Using SAS to determine file and space usage in UNIX. Title: Mike Montgomery [MIS Manager, MTN (South Africa)] Title: Author: Using SAS to determine file and space usage in UNIX Mike Montgomery [MIS Manager, MTN (South Africa)] Abstract The paper will show tools developed to manage a proliferation of SAS files

More information

Surfing the SAS cache

Surfing the SAS cache Surfing the SAS cache to improve optimisation Michael Thompson Department of Employment / Quantam Solutions Background Did first basic SAS course in 1989 Didn t get it at all Actively avoided SAS programing

More information

APPENDIX 3 Tuning Tips for Applications That Use SAS/SHARE Software

APPENDIX 3 Tuning Tips for Applications That Use SAS/SHARE Software 177 APPENDIX 3 Tuning Tips for Applications That Use SAS/SHARE Software Authors 178 Abstract 178 Overview 178 The SAS Data Library Model 179 How Data Flows When You Use SAS Files 179 SAS Data Files 179

More information

Advanced Database Systems

Advanced Database Systems Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed

More information

Speed Dating: Looping Through a Table Using Dates

Speed Dating: Looping Through a Table Using Dates Paper 1645-2014 Speed Dating: Looping Through a Table Using Dates Scott Fawver, Arch Mortgage Insurance Company, Walnut Creek, CA ABSTRACT Have you ever needed to use dates as values to loop through a

More information

Chapter 12: Indexing and Hashing

Chapter 12: Indexing and Hashing Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL

More information

David S. Septoff Fidia Pharmaceutical Corporation

David S. Septoff Fidia Pharmaceutical Corporation UNLIMITING A LIMITED MACRO ENVIRONMENT David S. Septoff Fidia Pharmaceutical Corporation ABSTRACT The full Macro facility provides SAS users with an extremely powerful programming tool. It allows for conditional

More information

SAS Scalable Performance Data Server 4.3 TSM1:

SAS Scalable Performance Data Server 4.3 TSM1: : Parallel Join with Enhanced GROUP BY Processing A SAS White Paper Table of Contents Introduction...1 Parallel Join Coverage... 1 Parallel Join Execution... 1 Parallel Join Requirements... 5 Tables Types

More information

Why Hash? Glen Becker, USAA

Why Hash? Glen Becker, USAA Why Hash? Glen Becker, USAA Abstract: What can I do with the new Hash object in SAS 9? Instead of focusing on How to use this new technology, this paper answers Why would I want to? It presents the Big

More information

Parallelizing Windows Operating System Services Job Flows

Parallelizing Windows Operating System Services Job Flows ABSTRACT SESUG Paper PSA-126-2017 Parallelizing Windows Operating System Services Job Flows David Kratz, D-Wise Technologies Inc. SAS Job flows created by Windows operating system services have a problem:

More information

BASICS BEFORE STARTING SAS DATAWAREHOSING Concepts What is ETL ETL Concepts What is OLAP SAS. What is SAS History of SAS Modules available SAS

BASICS BEFORE STARTING SAS DATAWAREHOSING Concepts What is ETL ETL Concepts What is OLAP SAS. What is SAS History of SAS Modules available SAS SAS COURSE CONTENT Course Duration - 40hrs BASICS BEFORE STARTING SAS DATAWAREHOSING Concepts What is ETL ETL Concepts What is OLAP SAS What is SAS History of SAS Modules available SAS GETTING STARTED

More information

IBM 3850-Mass storage system

IBM 3850-Mass storage system BM 385-Mass storage system by CLAYTON JOHNSON BM Corporation Boulder, Colorado SUMMARY BM's 385, a hierarchical storage system, provides random access to stored data with capacity ranging from 35 X 1()9

More information

NO MORE MERGE. Alternative Table Lookup Techniques

NO MORE MERGE. Alternative Table Lookup Techniques NO MORE MERGE. Alternative Table Lookup Techniques Dana Rafiee, Destiny Corporation/DDISC Group Ltd. U.S., Wethersfield, CT ABSTRACT This tutorial is designed to show you several techniques available for

More information

Extra: B+ Trees. Motivations. Differences between BST and B+ 10/27/2017. CS1: Java Programming Colorado State University

Extra: B+ Trees. Motivations. Differences between BST and B+ 10/27/2017. CS1: Java Programming Colorado State University Extra: B+ Trees CS1: Java Programming Colorado State University Slides by Wim Bohm and Russ Wakefield 1 Motivations Many times you want to minimize the disk accesses while doing a search. A binary search

More information

INTRODUCTION TO SAS HOW SAS WORKS READING RAW DATA INTO SAS

INTRODUCTION TO SAS HOW SAS WORKS READING RAW DATA INTO SAS TO SAS NEED FOR SAS WHO USES SAS WHAT IS SAS? OVERVIEW OF BASE SAS SOFTWARE DATA MANAGEMENT FACILITY STRUCTURE OF SAS DATASET SAS PROGRAM PROGRAMMING LANGUAGE ELEMENTS OF THE SAS LANGUAGE RULES FOR SAS

More information

Efficiency. Efficiency: Indexing. Indexing. Efficiency Techniques. Inverted Index. Inverted Index (COSC 488)

Efficiency. Efficiency: Indexing. Indexing. Efficiency Techniques. Inverted Index. Inverted Index (COSC 488) Efficiency Efficiency: Indexing (COSC 488) Nazli Goharian nazli@cs.georgetown.edu Difficult to analyze sequential IR algorithms: data and query dependency (query selectivity). O(q(cf max )) -- high estimate-

More information

Chapter 12: Indexing and Hashing. Basic Concepts

Chapter 12: Indexing and Hashing. Basic Concepts Chapter 12: Indexing and Hashing! Basic Concepts! Ordered Indices! B+-Tree Index Files! B-Tree Index Files! Static Hashing! Dynamic Hashing! Comparison of Ordered Indexing and Hashing! Index Definition

More information

The SERVER Procedure. Introduction. Syntax CHAPTER 8

The SERVER Procedure. Introduction. Syntax CHAPTER 8 95 CHAPTER 8 The SERVER Procedure Introduction 95 Syntax 95 Syntax Descriptions 96 Examples 101 ALLOCATE SASFILE Command 101 Syntax 101 Introduction You invoke the SERVER procedure to start a SAS/SHARE

More information

Optimizing System Performance

Optimizing System Performance 243 CHAPTER 19 Optimizing System Performance Definitions 243 Collecting and Interpreting Performance Statistics 244 Using the FULLSTIMER and STIMER System Options 244 Interpreting FULLSTIMER and STIMER

More information

Chapter 12: Query Processing

Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Basic Steps in Query Processing 1. Parsing and translation 2. Optimization 3. Evaluation 12.2

More information

Query Processing & Optimization

Query Processing & Optimization Query Processing & Optimization 1 Roadmap of This Lecture Overview of query processing Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Introduction

More information

SOS (Save Our Space) Matters of Size

SOS (Save Our Space) Matters of Size SOS (Save Our Space) Matters of Size By Matthew Pearce Amadeus Software Limited 2001 Abstract Disk space is one of the most critical issues when handling large amounts of data. Large data means greater

More information

An exercise in separating client-specific parameters from your program

An exercise in separating client-specific parameters from your program An exercise in separating client-specific parameters from your program Erik Tilanus The Netherlands WIILSU 2015 Milwaukee Do you recognize this? You write a 'one-time' program for one particular situation

More information

PROGRAMMING ROLLING REGRESSIONS IN SAS MICHAEL D. BOLDIN, UNIVERSITY OF PENNSYLVANIA, PHILADELPHIA, PA

PROGRAMMING ROLLING REGRESSIONS IN SAS MICHAEL D. BOLDIN, UNIVERSITY OF PENNSYLVANIA, PHILADELPHIA, PA PROGRAMMING ROLLING REGRESSIONS IN SAS MICHAEL D. BOLDIN, UNIVERSITY OF PENNSYLVANIA, PHILADELPHIA, PA ABSTRACT SAS does not have an option for PROC REG (or any of its other equation estimation procedures)

More information

Netsweeper Reporter Manual

Netsweeper Reporter Manual Netsweeper Reporter Manual Version 2.6.25 Reporter Manual 1999-2008 Netsweeper Inc. All rights reserved. Netsweeper Inc. 104 Dawson Road, Guelph, Ontario, N1H 1A7, Canada Phone: +1 519-826-5222 Fax: +1

More information

The Google File System

The Google File System The Google File System Sanjay Ghemawat, Howard Gobioff and Shun Tak Leung Google* Shivesh Kumar Sharma fl4164@wayne.edu Fall 2015 004395771 Overview Google file system is a scalable distributed file system

More information

HP Dynamic Deduplication achieving a 50:1 ratio

HP Dynamic Deduplication achieving a 50:1 ratio HP Dynamic Deduplication achieving a 50:1 ratio Table of contents Introduction... 2 Data deduplication the hottest topic in data protection... 2 The benefits of data deduplication... 2 How does data deduplication

More information

How TokuDB Fractal TreeTM. Indexes Work. Bradley C. Kuszmaul. Guest Lecture in MIT Performance Engineering, 18 November 2010.

How TokuDB Fractal TreeTM. Indexes Work. Bradley C. Kuszmaul. Guest Lecture in MIT Performance Engineering, 18 November 2010. 6.172 How Fractal Trees Work 1 How TokuDB Fractal TreeTM Indexes Work Bradley C. Kuszmaul Guest Lecture in MIT 6.172 Performance Engineering, 18 November 2010. 6.172 How Fractal Trees Work 2 I m an MIT

More information

Binary Encoded Attribute-Pairing Technique for Database Compression

Binary Encoded Attribute-Pairing Technique for Database Compression Binary Encoded Attribute-Pairing Technique for Database Compression Akanksha Baid and Swetha Krishnan Computer Sciences Department University of Wisconsin, Madison baid,swetha@cs.wisc.edu Abstract Data

More information

6. Results. This section describes the performance that was achieved using the RAMA file system.

6. Results. This section describes the performance that was achieved using the RAMA file system. 6. Results This section describes the performance that was achieved using the RAMA file system. The resulting numbers represent actual file data bytes transferred to/from server disks per second, excluding

More information

Physical Level of Databases: B+-Trees

Physical Level of Databases: B+-Trees Physical Level of Databases: B+-Trees Adnan YAZICI Computer Engineering Department METU (Fall 2005) 1 B + -Tree Index Files l Disadvantage of indexed-sequential files: performance degrades as file grows,

More information

Paper # Jazz it up a Little with Formats. Brian Bee, The Knowledge Warehouse Ltd

Paper # Jazz it up a Little with Formats. Brian Bee, The Knowledge Warehouse Ltd Paper #1495-2014 Jazz it up a Little with Formats Brian Bee, The Knowledge Warehouse Ltd Abstract Formats are an often under-valued tool in the SAS toolbox. They can be used in just about all domains to

More information

Database System Concepts

Database System Concepts Chapter 13: Query Processing s Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2008/2009 Slides (fortemente) baseados nos slides oficiais do livro c Silberschatz, Korth

More information

Create a Format from a SAS Data Set Ruth Marisol Rivera, i3 Statprobe, Mexico City, Mexico

Create a Format from a SAS Data Set Ruth Marisol Rivera, i3 Statprobe, Mexico City, Mexico PharmaSUG 2011 - Paper TT02 Create a Format from a SAS Data Set Ruth Marisol Rivera, i3 Statprobe, Mexico City, Mexico ABSTRACT Many times we have to apply formats and it could be hard to create them specially

More information

Hash-Based Indexing 165

Hash-Based Indexing 165 Hash-Based Indexing 165 h 1 h 0 h 1 h 0 Next = 0 000 00 64 32 8 16 000 00 64 32 8 16 A 001 01 9 25 41 73 001 01 9 25 41 73 B 010 10 10 18 34 66 010 10 10 18 34 66 C Next = 3 011 11 11 19 D 011 11 11 19

More information

All About SAS Dates. Marje Fecht Senior Partner, Prowerk Consulting. Copyright 2017 Prowerk Consulting

All About SAS Dates. Marje Fecht Senior Partner, Prowerk Consulting. Copyright 2017 Prowerk Consulting All About SAS Dates Marje Fecht Senior Partner, Prowerk Consulting Copyright 2017 Prowerk Consulting 1 SAS Dates What IS a SAS Date? And Why?? My data aren t stored as SAS Dates How can I convert How can

More information

TOP 10 (OR MORE) WAYS TO OPTIMIZE YOUR SAS CODE

TOP 10 (OR MORE) WAYS TO OPTIMIZE YOUR SAS CODE TOP 10 (OR MORE) WAYS TO OPTIMIZE YOUR SAS CODE Handy Tips for the Savvy Programmer SAS PROGRAMMING BEST PRACTICES Create Readable Code Basic Coding Recommendations» Efficiently choosing data for processing»

More information

Chapter 8 & Chapter 9 Main Memory & Virtual Memory

Chapter 8 & Chapter 9 Main Memory & Virtual Memory Chapter 8 & Chapter 9 Main Memory & Virtual Memory 1. Various ways of organizing memory hardware. 2. Memory-management techniques: 1. Paging 2. Segmentation. Introduction Memory consists of a large array

More information

Batch Jobs Performance Testing

Batch Jobs Performance Testing Batch Jobs Performance Testing October 20, 2012 Author Rajesh Kurapati Introduction Batch Job A batch job is a scheduled program that runs without user intervention. Corporations use batch jobs to automate

More information

DDS Dynamic Search Trees

DDS Dynamic Search Trees DDS Dynamic Search Trees 1 Data structures l A data structure models some abstract object. It implements a number of operations on this object, which usually can be classified into l creation and deletion

More information

Data Structure. IBPS SO (IT- Officer) Exam 2017

Data Structure. IBPS SO (IT- Officer) Exam 2017 Data Structure IBPS SO (IT- Officer) Exam 2017 Data Structure: In computer science, a data structure is a way of storing and organizing data in a computer s memory so that it can be used efficiently. Data

More information

University of Waterloo Midterm Examination Solution

University of Waterloo Midterm Examination Solution University of Waterloo Midterm Examination Solution Winter, 2011 1. (6 total marks) The diagram below shows an extensible hash table with four hash buckets. Each number x in the buckets represents an entry

More information

Tackling Unique Problems Using TWO SET Statements in ONE DATA Step. Ben Cochran, The Bedford Group, Raleigh, NC

Tackling Unique Problems Using TWO SET Statements in ONE DATA Step. Ben Cochran, The Bedford Group, Raleigh, NC MWSUG 2017 - Paper BB114 Tackling Unique Problems Using TWO SET Statements in ONE DATA Step Ben Cochran, The Bedford Group, Raleigh, NC ABSTRACT This paper illustrates solving many problems by creatively

More information

Short Note. The unwritten computing rules at SEP. Alexander M. Popovici, Dave Nichols and Dimitri Bevc 1 INTRODUCTION

Short Note. The unwritten computing rules at SEP. Alexander M. Popovici, Dave Nichols and Dimitri Bevc 1 INTRODUCTION Stanford Exploration Project, Report 80, May 15, 2001, pages 1?? Short Note The unwritten computing rules at SEP Alexander M. Popovici, Dave Nichols and Dimitri Bevc 1 INTRODUCTION This short note is intended

More information

Big Data Analytics CSCI 4030

Big Data Analytics CSCI 4030 High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Queries on streams

More information

Information Lifecycle Management for Business Data. An Oracle White Paper September 2005

Information Lifecycle Management for Business Data. An Oracle White Paper September 2005 Information Lifecycle Management for Business Data An Oracle White Paper September 2005 Information Lifecycle Management for Business Data Introduction... 3 Regulatory Requirements... 3 What is ILM?...

More information

Checking for Duplicates Wendi L. Wright

Checking for Duplicates Wendi L. Wright Checking for Duplicates Wendi L. Wright ABSTRACT This introductory level paper demonstrates a quick way to find duplicates in a dataset (with both simple and complex keys). It discusses what to do when

More information

Data Vault Partitioning Strategies WHITE PAPER

Data Vault Partitioning Strategies WHITE PAPER Dani Schnider Data Vault ing Strategies WHITE PAPER Page 1 of 18 www.trivadis.com Date 09.02.2018 CONTENTS 1 Introduction... 3 2 Data Vault Modeling... 4 2.1 What is Data Vault Modeling? 4 2.2 Hubs, Links

More information

. NO MORE MERGE - Alternative Table Lookup Techniques Dana Rafiee, Destiny Corporation/DDISC Group Ltd. U.S., Wethersfield, CT

. NO MORE MERGE - Alternative Table Lookup Techniques Dana Rafiee, Destiny Corporation/DDISC Group Ltd. U.S., Wethersfield, CT betfomilw tltlljri4ls. NO MORE MERGE - Alternative Table Lookup Techniques Dana Rafiee, Destiny Corporation/DDISC Group Ltd. U.S., Wethersfield, CT ABSTRACT This tutorial is designed to show you several

More information

From Manual to Automatic with Overdrive - Using SAS to Automate Report Generation Faron Kincheloe, Baylor University, Waco, TX

From Manual to Automatic with Overdrive - Using SAS to Automate Report Generation Faron Kincheloe, Baylor University, Waco, TX Paper 152-27 From Manual to Automatic with Overdrive - Using SAS to Automate Report Generation Faron Kincheloe, Baylor University, Waco, TX ABSTRACT This paper is a case study of how SAS products were

More information

ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation

ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation Weiping Liao, Saengrawee (Anne) Pratoomtong, and Chuan Zhang Abstract Binary translation is an important component for translating

More information

How to Incorporate Old SAS Data into a New DATA Step, or What is S-M-U?

How to Incorporate Old SAS Data into a New DATA Step, or What is S-M-U? How to Incorporate Old SAS Data into a New DATA Step, or What is S-M-U? Andrew T. Kuligowski Nielsen Media Research Abstract / Introduction S-M-U. Some people will see these three letters and immediately

More information

SAS File Management. Improving Performance CHAPTER 37

SAS File Management. Improving Performance CHAPTER 37 519 CHAPTER 37 SAS File Management Improving Performance 519 Moving SAS Files Between Operating Environments 520 Converting SAS Files 520 Repairing Damaged Files 520 Recovering SAS Data Files 521 Recovering

More information

Base and Advance SAS

Base and Advance SAS Base and Advance SAS BASE SAS INTRODUCTION An Overview of the SAS System SAS Tasks Output produced by the SAS System SAS Tools (SAS Program - Data step and Proc step) A sample SAS program Exploring SAS

More information

Intro to DB CHAPTER 12 INDEXING & HASHING

Intro to DB CHAPTER 12 INDEXING & HASHING Intro to DB CHAPTER 12 INDEXING & HASHING Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing

More information

Data Warehousing. New Features in SAS/Warehouse Administrator Ken Wright, SAS Institute Inc., Cary, NC. Paper

Data Warehousing. New Features in SAS/Warehouse Administrator Ken Wright, SAS Institute Inc., Cary, NC. Paper Paper 114-25 New Features in SAS/Warehouse Administrator Ken Wright, SAS Institute Inc., Cary, NC ABSTRACT SAS/Warehouse Administrator 2.0 introduces several powerful new features to assist in your data

More information

Chapter 12: Query Processing

Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Overview Chapter 12: Query Processing Measures of Query Cost Selection Operation Sorting Join

More information

CSE 530A. B+ Trees. Washington University Fall 2013

CSE 530A. B+ Trees. Washington University Fall 2013 CSE 530A B+ Trees Washington University Fall 2013 B Trees A B tree is an ordered (non-binary) tree where the internal nodes can have a varying number of child nodes (within some range) B Trees When a key

More information

CS5412 CLOUD COMPUTING: PRELIM EXAM Open book, open notes. 90 minutes plus 45 minutes grace period, hence 2h 15m maximum working time.

CS5412 CLOUD COMPUTING: PRELIM EXAM Open book, open notes. 90 minutes plus 45 minutes grace period, hence 2h 15m maximum working time. CS5412 CLOUD COMPUTING: PRELIM EXAM Open book, open notes. 90 minutes plus 45 minutes grace period, hence 2h 15m maximum working time. SOLUTION SET In class we often used smart highway (SH) systems as

More information

Version 8 Base SAS Performance: How Does It Stack-Up? Robert Ray, SAS Institute Inc, Cary, NC

Version 8 Base SAS Performance: How Does It Stack-Up? Robert Ray, SAS Institute Inc, Cary, NC Paper 9-25 Version 8 Base SAS Performance: How Does It Stack-Up? Robert Ray, SAS Institute Inc, Cary, NC ABSTRACT This paper presents the results of a study conducted at SAS Institute Inc to compare the

More information

Clustering and Reclustering HEP Data in Object Databases

Clustering and Reclustering HEP Data in Object Databases Clustering and Reclustering HEP Data in Object Databases Koen Holtman CERN EP division CH - Geneva 3, Switzerland We formulate principles for the clustering of data, applicable to both sequential HEP applications

More information

Lecture 13. Lecture 13: B+ Tree

Lecture 13. Lecture 13: B+ Tree Lecture 13 Lecture 13: B+ Tree Lecture 13 Announcements 1. Project Part 2 extension till Friday 2. Project Part 3: B+ Tree coming out Friday 3. Poll for Nov 22nd 4. Exam Pickup: If you have questions,

More information

Summarizing Impossibly Large SAS Data Sets For the Data Warehouse Server Using Horizontal Summarization

Summarizing Impossibly Large SAS Data Sets For the Data Warehouse Server Using Horizontal Summarization Summarizing Impossibly Large SAS Data Sets For the Data Warehouse Server Using Horizontal Summarization Michael A. Raithel, Raithel Consulting Services Abstract Data warehouse applications thrive on pre-summarized

More information

Best Practice for Creation and Maintenance of a SAS Infrastructure

Best Practice for Creation and Maintenance of a SAS Infrastructure Paper 2501-2015 Best Practice for Creation and Maintenance of a SAS Infrastructure Paul Thomas, ASUP Ltd. ABSTRACT The advantage of using metadata to control and maintain data and access to data on databases,

More information

CAPACITY PLANNING FOR THE DATA WAREHOUSE BY W. H. Inmon

CAPACITY PLANNING FOR THE DATA WAREHOUSE BY W. H. Inmon CAPACITY PLANNING FOR THE DATA WAREHOUSE BY W. H. Inmon The data warehouse environment - like all other computer environments - requires hardware resources. Given the volume of data and the type of processing

More information

Guide Users along Information Pathways and Surf through the Data

Guide Users along Information Pathways and Surf through the Data Guide Users along Information Pathways and Surf through the Data Stephen Overton, Overton Technologies, LLC, Raleigh, NC ABSTRACT Business information can be consumed many ways using the SAS Enterprise

More information

Virtual Memory - Overview. Programmers View. Virtual Physical. Virtual Physical. Program has its own virtual memory space.

Virtual Memory - Overview. Programmers View. Virtual Physical. Virtual Physical. Program has its own virtual memory space. Virtual Memory - Overview Programmers View Process runs in virtual (logical) space may be larger than physical. Paging can implement virtual. Which pages to have in? How much to allow each process? Program

More information

If You Need These OBS and These VARS, Then Drop IF, and Keep WHERE Jay Iyengar, Data Systems Consultants LLC

If You Need These OBS and These VARS, Then Drop IF, and Keep WHERE Jay Iyengar, Data Systems Consultants LLC Paper 2417-2018 If You Need These OBS and These VARS, Then Drop IF, and Keep WHERE Jay Iyengar, Data Systems Consultants LLC ABSTRACT Reading data effectively in the DATA step requires knowing the implications

More information

Are Your SAS Programs Running You? Marje Fecht, Prowerk Consulting, Cape Coral, FL Larry Stewart, SAS Institute Inc., Cary, NC

Are Your SAS Programs Running You? Marje Fecht, Prowerk Consulting, Cape Coral, FL Larry Stewart, SAS Institute Inc., Cary, NC Paper CS-044 Are Your SAS Programs Running You? Marje Fecht, Prowerk Consulting, Cape Coral, FL Larry Stewart, SAS Institute Inc., Cary, NC ABSTRACT Most programs are written on a tight schedule, using

More information

Handling Numeric Representation SAS Errors Caused by Simple Floating-Point Arithmetic Computation Fuad J. Foty, U.S. Census Bureau, Washington, DC

Handling Numeric Representation SAS Errors Caused by Simple Floating-Point Arithmetic Computation Fuad J. Foty, U.S. Census Bureau, Washington, DC Paper BB-206 Handling Numeric Representation SAS Errors Caused by Simple Floating-Point Arithmetic Computation Fuad J. Foty, U.S. Census Bureau, Washington, DC ABSTRACT Every SAS programmer knows that

More information

Indexing: Overview & Hashing. CS 377: Database Systems

Indexing: Overview & Hashing. CS 377: Database Systems Indexing: Overview & Hashing CS 377: Database Systems Recap: Data Storage Data items Records Memory DBMS Blocks blocks Files Different ways to organize files for better performance Disk Motivation for

More information

Stephen M. Beatrous, SAS Institute Inc., Cary, NC John T. Stokes, SAS Institute Inc., Austin, TX

Stephen M. Beatrous, SAS Institute Inc., Cary, NC John T. Stokes, SAS Institute Inc., Austin, TX 1/0 Performance Improvements in Release 6.07 of the SAS System under MVS, ems, and VMS' Stephen M. Beatrous, SAS Institute Inc., Cary, NC John T. Stokes, SAS Institute Inc., Austin, TX INTRODUCTION The

More information

Are you Still Afraid of Using Arrays? Let s Explore their Advantages

Are you Still Afraid of Using Arrays? Let s Explore their Advantages Paper CT07 Are you Still Afraid of Using Arrays? Let s Explore their Advantages Vladyslav Khudov, Experis Clinical, Kharkiv, Ukraine ABSTRACT At first glance, arrays in SAS seem to be a complicated and

More information

Table Lookups: From IF-THEN to Key-Indexing

Table Lookups: From IF-THEN to Key-Indexing Table Lookups: From IF-THEN to Key-Indexing Arthur L. Carpenter, California Occidental Consultants ABSTRACT One of the more commonly needed operations within SAS programming is to determine the value of

More information

File Management By : Kaushik Vaghani

File Management By : Kaushik Vaghani File Management By : Kaushik Vaghani File Concept Access Methods File Types File Operations Directory Structure File-System Structure File Management Directory Implementation (Linear List, Hash Table)

More information