Craig Ray I ORr, IN::.

Size: px
Start display at page:

Download "Craig Ray I ORr, IN::."

Transcription

1 IMPIEMENTATION OF A HASHING ROUTINE IN SAS SOFTilARE Craig Ray I ORr, IN::. 1. INTROOUCTION Hashing may be the fastest generalized technique for table lookup. Table lookup refers to the cross reference of a parameter file based on the value of a variable in the primary (main) file. 4) 5) Result of Lookup - the variables needed from the lookup file. Seek - One conparison in attenpting to perfonn a lookup. For example, if a table lookup was successful after looking at three observations, then the lookup required three seeks. While the concept of hashing is well docl..lireilted in conputer science texts and is often implemented in third generation progranming language applications, hashing has received very little attention in the SAS system. This may be due to alternative non -procedural techniques of table lookup available in the SAS System such as MERGE and PROC FORMAT with the PUT function. (The tenn non-procedural means specifying what is to be done, not how to do it; therefore non-procedural table lookup techniques require less progranroing). While other techniques are available in the SAS System, they should not be used interchangeably. Depending on circumstances, one technique may be clearly superior to all others in terms of efficiency with respect to CPU time and "wall clock" time. Hashing may not be extensively used by SAS users because the inplementation Of hashing in the SAS System is not readily apparent. This psper bridges that gap. Not only is the concept of hashing presented, but the code to inplement the concept in the SAS System is explained. This paper is intended to be an extension to the paper I A Comparison of Table Lookup Techniques (reference 1). In that paper, three techniques for table lookup were discussed and the applicability of each was compared. 2. DEFINITIONS There are five terms that are used extensively throughout this paper: 1) Main File - the primary file of interest that is being processed one observation at a time. 2) Lookup File - the file that will be referenced for all or some 'of the observations of the main file in order to obtain auxiliary information. 3) Key - the variables in connnon between the main file and the lookup file. In most applications, the key is unique for each observation in the lookup file. This is not necessarily true for the main file. (Even if not explicitly stated, for all lookup methods, the key may consist of more than one variable i. e., the key variables may always ~ concatenated into one variable.) For example, the data in Figure 1 represent a mailing list as the main file and a file with unique records for each zip code as the lookup file. The main file consists of each potential customer's name and his/her address. The lookup file. is a s6j?citately ~intained file <;,f auxiliary lnformatlon relatmg to every ZlP code. Table lookup attenpts to relate, each observation of the main file with the corresponding observation of the lookup file by utilizing the key variable. In this example, for each observation in the mailing list (main file), the corresponding observation in the zip code file (lookup file) is related by ZIP (~ to obtain the resulting number of piano tuners indicated by the value of TUNERS (result of ~. 3. CLASSICAL TECHNIQUES FOR TABLE LOOKUP To illustrate the effectiveness of hashing as a technique for searching, it is necessary to first review alternative methods of searching. Other classical methods of searching include sequential and binary search. Conceptually, the technique most easily understood is the sequential search. In sequential search, each observation o~ the lookup file is examined in sequence, untll the desired observation is found. If the lookup file has N observations, then the sequential search will require, on average, N/2 seeks. Obviously except for very small lookup files, this meth~ will not be very efficient. Another classical method of searching is the binary search. If the lookup table is sorted by the key of the lookup, a binary sear<;h will successfully divide the range of obsexvatlons to be searched in half until the desired observation is found. For a lookup file with N observations, the desired. observation will be located on average, in INT (LOG2 (N» seeks. For example, for a lo.okup tabl~ wi~h 10 observations, the deslred observatl0n mil be found on average in three seeks; for a lookup table with 1,000 observations, the desired. observation will be found on average; in nine seeks and for a lookup table with 1,000,000 observations, the sought after observation will be found on average in 19 seeks ~ This is extremely fast when compared with the sequential search. 1184

2 The final classical..,thod of searching ani the subject of this paper is hashing: n;e..,thod of hashing rearranges the observat1ons ill the lookup table such that the value of the key indicates where the observation is to be placed. If this rearrangenent is done well, then the desired observation is found on average in less than 1. 5 seeks. The nuniber of seeks for a table with 1,000 observation will be a~roxiroately the same as for a lookup table w1th 1,000,000 observations. However, the lookup file must be rearranged (i. e. I hashed) into an. a,rea approximately twice as large as the orlgmal lookup file for hashing to be efficient. The use of binary search compared to bashing may be viewed as a trade-off between spaoe and tine. For binary search, the lookup file does not need any extra space as is necessary for bashing. However, each baabing lookup requires fewer seeks on average corrpared to the nurriber binary search seeks. Over the years, the computer industry bas consistently made ~ter resources (e.g., disk and core) more plentlful, making hashing the more attractive alternative. 4. THE CONCEPT OF HASHING To perfonn table lookup using baabing, an internediate step to create a bash table from the lookup file is neoessary. A bash table is constructed. by performing an operation on the value of the key variable(s) of the lookup file. The result of this operation yields the address for each observation in the hash table (Le., the observation nuniber in the bash table HAS data set). T~ act~lly perform the lookup, the sa~e operatlon 15 performed on the key varlables (8) from the main file. This yields the address to look at in the bash table for the ~orrespanding observation (i.e., like key values ill the lookup file and main file will yield the sane address). In the simplest ease, the key itself can be used as the ad:iress. In the exanple below, the lookup table, keyed on ZIP, maps into the bash table if ZIP is used as a pointer into the bash table. Using the following HAS DATA step, the lookup may be performed for each observation in MAIN: SET HASHTABLE (KEEP='l'UNERS) POINT=ZIP; The loo~ i~ successful in only one seek per observat10n ill MAIN; howevar, this is at the ~e of an extreme waste of space. A lookup f11e of merely three observations is rearranged into a bash table of 99,999 obaervations. The constructed bash table is depicted in Figure 2. It is In?re prudent to use a function (Le., "bash algonthm") of the key variable (s) as a pointer into the bash table, as in the exanple below. In this example, the MOD base 10 function is used on the key of each observation of the lookup file. The result of this function indicates the placement of the obsezvation in the bash table. To execute the lookup, the sane MOD base 10 function is performed on the key from the main file; the result of the operation is used to point to the bash table. This lookup step is illustrated by the following HAS IlI\TA step: PTR=MOD(ZIP, 10) + 1; SET HASHTBIE (KEEP='l'UNERS) POJ:NT=,pTR; In this case, the lookup would again be performed in only one seek per observation in MAIN; however, the hash table is considerably smaller than in the previous example. The constructed bash table is depicted in Figure 3. Not all cases, however, can be expected. to work this well. In particular, for any given function of the key, two or more observations from the lookup file will typically yield the same address. These are defined as collisions. If two or more observations flhash" to the same address, only one can go to that location in the bash table. The remainder are sent sequentially to an overflow table. Pointers are then maintained from the bash table to the overflow table. This is demonstrated in Figure 4. The lookup file contains four observations all yielding 3 as the address if the MOD base 10 function is used as the hash algorithm. Arbitrarily, the last three are sent to the overflow table and the variables FIRST and LllST in the hash table point to those observations in the overflow table. When perfonning the lookup, if the hash algorithm for any observation in MAIN yields 3 as an address, the value of ZIP in MAIN is compared to the value of ZIP at observation 3 in the bash table. If they are equal, the lookup is successful in one seek.. Otherwise, FIRST and LllST point to additional observations in the overflow table, whose hashed. values also yielded the address 3. The overflow table is searched sequentially, between FIRST and LAST, until the sought after observation is found or all the observations pointed to are read. TO inplenent bashing, it is critical to be able to execute an operation on the value of the key variable (s) that yields an observation nuniber. Fortunately, baabing is possible even if the keys are character values. It is only necessaxy to convert characrer representations to numeric values using the binary number equivalent of the ASCII or EBCDIC representation of the character string. (This conversion can be done using the data step INPUT function with an appropriate conversion format.) A ntuneric operation can be performed on this numeric equivalent. Finally, for hashing to perform well, a good hash alqorithrn is required. If a poor hash algorithm is used, the net result may hardly be I;etter than a ~equential search. For exarrple, 1f MOD base 1 1S chosen as the bash algorithm, all observations Of the lookup file would yield 1185

3 the same address. As a result, the hash table would contain only one observation; the remainder would be sent to the overflow table. This would simulate the sequential search. Ordinarily 1 the MOD function performs exceedingly well. The base of the M:)D function determines the appropriate size of the hash table. It is recomnended that this base be a prilne number to avoid the possibility that an unusually large number of observations "hash" to the same address. To obtain reasonably fast searches (less than 1.5 seeks per lookup, on average), the hash table should be approximately twice the size of the lookup file. This can be adjusted depending on resource constraints. A larger hash table may be created by increasing the base of the M:)D function if speed of the lookup is absolutely essential. The size of the hash table may be decreased at the cost of slower lookups if space is at a prernitnn. Many standard texts (see reference 2) on data structure provide a more thorough treatroont of the subject of hashing. 5. IMPLEMENTATION OF HASHING IN THE SAS SYSTEM The hash table should be stored as a SAS data set, which is stored on disk. This is the rna jor difference between the inplementation of hashing in traditional third generation programning languages and in the SAS System. In traditional languages, the hash table is typically stored in core as a series of parallel arrays. This difference has its trade-offs. Random access in core is nb.lch faster than random access from disk (using SET with the POINT option in SAS); the number of random accesses should be less than 1.5 on average per lookup if the hash table has been efficiently constructed. TYPically disk space is much more plentiful than space in core. Thus, space is not as serious a constraint in SAS, even on micro corrputers, as it is in other languages. The major obstacle to inplement bashing in SAS is creating the hash table. When creating the hash table, it is necessary to. calculate the observation number that indicates the location of each observation of the lookup file. There is no. facility in SAS to. "OUTPUT with a POINT eption. It A design that solves this problem is depicted in Figure 5. A preliminary DATA step performs the hash algoritlun on the key variable (s) and stores this in the variable, ADDRESS. The output data set, HASllVAR, is then sorted by ADDRESS. The last DATA step outputs two SAS data sets: HASHTBLE and OVERFWW. The DATA step insures that the observation number of HASHTBLE and the variable ADDRESS are equal. Where there is more than cne cbservaticn in HASllVAR with the same value.of ADDRESS, all but the last are sent to OVERFWW. Where there are gaps in the values of ADDRESS in liashvar, blank observations are o.utput to. HASHTBLE. This insures that ADDRESS (the result of the hash algorithm) actually co.rresponds to. the observaticn number in HASHTBLE. The code to inplement bashing in the SAS System has been divided into. two. parts: creating the hash table; and perfoiilling lookups on the hash table. The ccde to create the hash table is contained in Figure 6. A preliminary DATA step creates a SAS data set, HASHVAR, that contains the hashing address. The second argument of the MOD function will generally change according to the number of observations in the lookup file. HASH\lAR is then sorted by the calculated address and the generalized. macro, -%HASHTBLE, is called. to create the hash table and overflow table. (Note, as coded. in the macro %HASHTBLE it is required that the variable containing the ADDRESS actually be called ADDRESS.) The code to actually perform the lookup on the hash table is contained in Figure 7. The progranunemr must set up the DATA step and perform the hash function on the key of the lookup and put the result in a variable, ADDRESS. (Note, the function ITUlst be exactly the same as the function used when creating the hash table; therefore, the function may. be placed in a macro that is called in both cases to ensure consistency.) The macro %HASHFIND is then called to search the hash and overflow tables. (Note: the code generated by %Hl\SHFIND is only a portion of a DATA step and does not set up the DATA step.) When the code generated by %HASIIFIND has finished, the program knows if the lookup was successful by comparing the variable~ KEY and TELE _KEY for equality. Any prograrmung statements may follow the call to macro %HASHFIND. 6. WHEN TO USE HASHING IN THE SAS SYSTEM While hashing is the fastest generalized technique for table lookup, it nrust be placed in lts proper perspective within the SAS System. The non-procedural techniques available, namely PROC FORMAT with the PUT function and MERGE, are more appropriate under certain circumstances; under other conditions, a SAS coded binary search would be ~ most appropriate. As a "rule of thumb", the most appropriate method for table lookup in the SAS System can be determined as a function of the number of observation in both the main and lookup files. This is depicted in Figure 8. SAS searches formats very rapidly so that as long as the lookup table is not too large, PROC FORMAT with the PUT function is the best method. Otherwise if the main file is rather small (i.e., very fe~ lookups are required) then a SAS coded binary search is likely to be the best rrethod. The overhead costs associated with just creating the hash table will be greater than the cost of a few binary searches of a sorted lookup table. As can be seen, hashing is a competing method with MERGE. Both perform reasonably well with a fairly large number of Observations in both the main and lookup files (e.g. 30,000 observations 1186

4 in both). MERGE, however, is easier to code. Using MERGE: may require an extra DATA step. FIRST. and LAST. processing may not be possible using MERGE: because the main file ordinarily nrust be resorted by the keys of the lookup file. In this case, hashing may be preferable because it does not require resorting the main file. Additionally, if the main file is very large (e.g., over 1,000,000 observations), then resorting the data set as is required by MERGE: will be prohibitively expensive. In this case, hashing is likely to be the bext technique. In conclusion, hashing in the SAS System is not a teclmique to be used by progranmers for "nm-of-the-millll table lookup applications. Rather, it is a tool to be employed for larger awlications as circumstances dictate. 7. ACKN~S The author is indebted to Stephen weiss who belped to develop the approach for implementing hashing in the SAS System; Bob Pulgino who volunteered use of his Apple Macintosh for the preparation of the slides used for presentation; and Tina Feggans for preparing this manuscript. The author can be contacted at: ORI, Inc. SUite Indiana Avenue, N. W. Washington, D.C (202) References 1) Ray, Craig (1987), "A Comparison of Table LOokup Techniques", Proceedings of the Twelfth SAS User's Group International COnference, Cary, NC: SAS Institute, Inc. 2) Flores, Ivan. Data Structure and Management, 2nd Edition, prentice Hall, Inc., SAS is a registered trademark of SAS Institute Inc., Cary NC, USA. FIG Sample Data rig 2 Create a Hash Table MAIN FILE NAME ZIP POLITICAL PARTY ADAMS, JOHN R COI.OMBUS, CHRIS ~2634 I COOPER, PAULA R DALTON, JAMES D DEBBS, EUGENE S LORAN, NANCY D MARX, KARL C E'ORTER, ALAN R SOBER, TOM D THORPE, MARTHA I LOOKUP FILE lookug Table Ob, t-- Z.ip Tuners, ' Use the value of the kelt. itself as an observation number. Hash Table Ob, Zip Tuners ZIP COUNTY TUNERS MANHATTEN 20 KEY ZIP ARLINGTON MONROE 15 RESULT OF LOOKUP ANY COMBINATION OF OTHER LOOKUP FILE VARIABLES SET HASHTBLE(KEEP=TUNERS) POINT=ZIP; RUN; 1187

5 FIG J Create a Hash Table Use a function of the key as an observation number. Example: FIG Y Collisions Hash Algorithm: Obs = MOD(ZIP,10)+1 Example: Use MOD base 10 Lookup Table JOBS = MOD(ZIP,lO) + 1 Hash Table Ob, LookYQ Table ~? Tune!:s Ob, I'-- Zip 'l'uner;!..-- Ob. zip Tuners S : : ls PTR = MOD (ZIP, 10) + 1; ls S SET HASHTBLE POINT = PTR; RUN; FIG 5 Design for Creating Hash Table in SAS Hash Table Ob, ---- Zip Tuners First , S Ob. ""'-ZiP Overflow Table Tuners '- LaS:'- 3 DATA LOOKUP Contains: KEY Result oi Lookup Perform hash algorithm on key PRCe SORT BY ADDRESS HASHVAR Contains: KEY Result of Lookup ADDRESS DATA Sorted HA SHVAR Create HASHTBLE and OVERFLOW -- C. HASHTBLE ~ TBLE_KEY Result of Lookup ADDRESS FIRST LAST L--. OVERFLOW Contains: TBLE_KEY Result of Lookup ADDRESS 1188

6 !="IG DATA HASHVAR; SeT LOOKUP; ADDRESS = MODCKEY,2347J; RUN; PRoe SORT DATA=HASHVAR; "BY ADDRESS; Run; XHASHTBlE ElSE DO; /* OUTPUT TO OVERFLOW */ OVEROBS + 1; OUTPUT OVERFLOW; END; /)E OUTPUT TO OVERFLOW */ %TESTPRNT(IN=HASHTBLE) xtestprth (I N=OVERFlOW) ~PUT "STR( ).; XPUT NOTE: *** MACRO HASHTBLE HAS FINISHED. XMEND HASHTBLE. %MACRO HASHTBlE; ~:***********************************************; x* THIS MACRO OPERATES'ON A SAS DATASET WHICH :: Xl!: HAS HAD ITS KEYS PUT THRU A HASH ALGORITHM *: Xl!: AND SORTED BY THE RESULT OF THE HASH *~ Xl!: ALGORITHM. THIS MACRO THEN CREATes A SAS *: Xli: DATA SET. HASHTBLE, "!HICH CONTAINS ONE Oas *: x* for EACH UNIQUE HASH ADDRESS DUPLICATE *: Xl!: ADDRESSES ARE SENT TO SAS DATASET OVERFlOH *: Xli: POIfHERS ARE THEN MAINTAINED FROM HASHTBl-E' *: X* TO OVERFLOW. abs IN HASHTBlE WITH NO *: X* MAPPING ARE FILLED IN WITH KEY = MISSING *: ~~ TO INDICATE NO FIND. *; x* INPUT'DATA SET HASHVAR CONTAINS: :IE; X* KEY. :~ XlE ADDRESS *; X~ RESUL T OF LOOKUP 31:. ~* OUTPUT HASHTBLE CONTAINS: ). ;.~ Ig~~E~~Y. :~ x* FIRST & LAST (POINTERS TO OVERFLOW *; ;C~ RESULT OF LOOKUP 1. X* OUTPUT OVERFLOW CONTAINS: ). i: Ig~~E~~Y :; ~: RESUL T OF LOOKUP *; X* WRITTEN By: CRAIG' RAY, OR!, INC. :~ X*. *: 70************************************************; XPUT "STR( ). XPUT NOTE: *n MACRO HASHTBLE HAS BEGUN; DATA HASHTBLE OVERFLOW(DROP=FIRST LAST); LENGTH TBLE~KEY $ 11; DROP OVEROBS HASHOBS KEY; RETAIN FIRST; SET HASHVAR. BY ADDRESS; TBLE_KEY =, ". DO WHILE(ADDRESS > HASHOBS+l). END. OUTPUT HASHTBtE. HASHOBS + I. TBlEJEY = KEY. IF FIRST.ADDRESS AND LAST. ADDRESS THEN DO; /*.SINGlE MAP - No OVERFlOli ) / OUTPUT HASHTBlE; HASHOBS +.1; END. /)E" SINGLE MAP ~ NO OVERFlOi'l )E"/ ELSE IF FIRST.ADDRESS THEN DO; /)E" SEND TO OVERFlOi'l AND INITIALIZE FIRST 31;/ OVEROBs + 1; FIRST = OV EROBS; OUTPUT OVERFLOW; END. /* SEND TO OVERFLOW AND INITIALIZE FIRST 31:/ ELSE IF LAST.ADDRESS THEN DO; /)E OUTPUT TO HASHTBlE X/ END. LAST = OVEROBS; OUTPUT HASHTBLE. HASHOBS + 1. FIRST =.; /1 OUTPUT To HASHTBLE )E/ DATA FIND. ~~~R~~~N~ MOD(KEY,Z347); XHASHFIND FIG 1 /* CHECK KEY = TBLE KEY TO TELL IF OSS FOUND */ IF KEY = TSLE_KEY THEN OUTPUT; RUN. "MACRO HASHFIND; X************************************************. "* X* THIS MACRO PERFORMS THE ACTUAL LOOKUP ON A *; "* HASH TABLE. IT ASSUMES THAT A VARIABLE *; X* NAMED ADDRESS HAS BEEN CREATED CONTAINING *. x* THE OBSERVATIOf~ TO BE REFERENCED IN SAS )E; X* - DATA SET HASHTBlE. THE MACRO MAY THEN GO *. %* TO SAS.DATA SET OVERFLOW BASED ON POINTERS *; X* IN HASHTBlE. *; X* *; X* WRITTEN BY: CRAIG RAY, ORI, INC. *; p " x********-****************************-************. xput "STR( ); XPUT NOTE: *** MACRO HASH FIND HAS BEGUN; IF ADDRESS <= THASHOBS THEN DO; /* SEARCH HASHTBlE AND/OR OVERFLOW */ SET HASHTBLE POINT=ADDRESS NOBS=THASHOBS; IF KEY NE TBLE_KEY AND TBLE_KEY NE I AND LAST NE. THEN DO; /* SEARCH OVERFLOW *-/ OVERPTR = FIRST; DO UNTIL(OVERPTR > LAST OR KEY END; IF OVERPTR <= TOVEROBS THEN DO; /* PERFORM SETS */ SET OVERFLOW POINT=OVERPTR NOBS=TOVEROBS; OVERPTR + 1. END; /* PERFORM SETS */ ELSE OVERPTR = LAST + 1.; /* FORCE END OF LOOP 3V END; /* SEARCH OVERFLOW */ END; /* SEARCH HASHTBtE AND/OR OVERFLOW */ "PUT XSTR( ); XPUT NOTE: *** MACRO HASHFIND HAS FINISHED. XMEND HASHFIND; xput XSTR( ); ~PUT NOTE: *** MACRO HASHFIND NOW LOADED; 1189

7 10,000 2 Ii: c: f.- 'ro :a ~,5 2.; <.).c 0 0 n: Q. '0 ~.c '" E ~ z Sort / Merge II HI-\SHIN(~ Binary Search 5, ,000 Number of Obs, in Lookup File 1190

Gary L. Katsanis, Blue Cross and Blue Shield of the Rochester Area, Rochester, NY

Gary L. Katsanis, Blue Cross and Blue Shield of the Rochester Area, Rochester, NY Table Lookups in the SAS Data Step Gary L. Katsanis, Blue Cross and Blue Shield of the Rochester Area, Rochester, NY Introduction - What is a Table Lookup? You have a sales file with one observation for

More information

Hashing. Hashing Procedures

Hashing. Hashing Procedures Hashing Hashing Procedures Let us denote the set of all possible key values (i.e., the universe of keys) used in a dictionary application by U. Suppose an application requires a dictionary in which elements

More information

Hash-Based Indexes. Chapter 11

Hash-Based Indexes. Chapter 11 Hash-Based Indexes Chapter 11 1 Introduction : Hash-based Indexes Best for equality selections. Cannot support range searches. Static and dynamic hashing techniques exist: Trade-offs similar to ISAM vs.

More information

Why Hash? Glen Becker, USAA

Why Hash? Glen Becker, USAA Why Hash? Glen Becker, USAA Abstract: What can I do with the new Hash object in SAS 9? Instead of focusing on How to use this new technology, this paper answers Why would I want to? It presents the Big

More information

capabilities and their overheads are therefore different.

capabilities and their overheads are therefore different. Applications Development 3 Access DB2 Tables Using Keylist Extraction Berwick Chan, Kaiser Permanente, Oakland, Calif Raymond Wan, Raymond Wan Associate Inc., Oakland, Calif Introduction The performance

More information

TUTORIAL ON INDEXING PART 2: HASH-BASED INDEXING

TUTORIAL ON INDEXING PART 2: HASH-BASED INDEXING CSD Univ. of Crete Fall 07 TUTORIAL ON INDEXING PART : HASH-BASED INDEXING CSD Univ. of Crete Fall 07 Hashing Buckets: Set up an area to keep the records: Primary area Divide primary area into buckets

More information

Stephen M. Beatrous, SAS Institute Inc., Cary, NC John T. Stokes, SAS Institute Inc., Austin, TX

Stephen M. Beatrous, SAS Institute Inc., Cary, NC John T. Stokes, SAS Institute Inc., Austin, TX 1/0 Performance Improvements in Release 6.07 of the SAS System under MVS, ems, and VMS' Stephen M. Beatrous, SAS Institute Inc., Cary, NC John T. Stokes, SAS Institute Inc., Austin, TX INTRODUCTION The

More information

Indexing: Overview & Hashing. CS 377: Database Systems

Indexing: Overview & Hashing. CS 377: Database Systems Indexing: Overview & Hashing CS 377: Database Systems Recap: Data Storage Data items Records Memory DBMS Blocks blocks Files Different ways to organize files for better performance Disk Motivation for

More information

In-Memory Searching. Linear Search. Binary Search. Binary Search Tree. k-d Tree. Hashing. Hash Collisions. Collision Strategies.

In-Memory Searching. Linear Search. Binary Search. Binary Search Tree. k-d Tree. Hashing. Hash Collisions. Collision Strategies. In-Memory Searching Linear Search Binary Search Binary Search Tree k-d Tree Hashing Hash Collisions Collision Strategies Chapter 4 Searching A second fundamental operation in Computer Science We review

More information

Logical File Organisation A file is logically organised as follows:

Logical File Organisation A file is logically organised as follows: File Handling The logical and physical organisation of files. Serial and sequential file handling methods. Direct and index sequential files. Creating, reading, writing and deleting records from a variety

More information

Hash-Based Indexing 1

Hash-Based Indexing 1 Hash-Based Indexing 1 Tree Indexing Summary Static and dynamic data structures ISAM and B+ trees Speed up both range and equality searches B+ trees very widely used in practice ISAM trees can be useful

More information

Hashing. 1. Introduction. 2. Direct-address tables. CmSc 250 Introduction to Algorithms

Hashing. 1. Introduction. 2. Direct-address tables. CmSc 250 Introduction to Algorithms Hashing CmSc 250 Introduction to Algorithms 1. Introduction Hashing is a method of storing elements in a table in a way that reduces the time for search. Elements are assumed to be records with several

More information

Merge Processing and Alternate Table Lookup Techniques Prepared by

Merge Processing and Alternate Table Lookup Techniques Prepared by Merge Processing and Alternate Table Lookup Techniques Prepared by The syntax for data step merging is as follows: International SAS Training and Consulting This assumes that the incoming data sets are

More information

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2014 Lecture 14

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2014 Lecture 14 CS24: INTRODUCTION TO COMPUTING SYSTEMS Spring 2014 Lecture 14 LAST TIME! Examined several memory technologies: SRAM volatile memory cells built from transistors! Fast to use, larger memory cells (6+ transistors

More information

Advanced Database Systems

Advanced Database Systems Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed

More information

Hashed-Based Indexing

Hashed-Based Indexing Topics Hashed-Based Indexing Linda Wu Static hashing Dynamic hashing Extendible Hashing Linear Hashing (CMPT 54 4-) Chapter CMPT 54 4- Static Hashing An index consists of buckets 0 ~ N-1 A bucket consists

More information

Optimizing System Performance

Optimizing System Performance 243 CHAPTER 19 Optimizing System Performance Definitions 243 Collecting and Interpreting Performance Statistics 244 Using the FULLSTIMER and STIMER System Options 244 Interpreting FULLSTIMER and STIMER

More information

HASH TABLES. Hash Tables Page 1

HASH TABLES. Hash Tables Page 1 HASH TABLES TABLE OF CONTENTS 1. Introduction to Hashing 2. Java Implementation of Linear Probing 3. Maurer s Quadratic Probing 4. Double Hashing 5. Separate Chaining 6. Hash Functions 7. Alphanumeric

More information

Table Lookups: From IF-THEN to Key-Indexing

Table Lookups: From IF-THEN to Key-Indexing Table Lookups: From IF-THEN to Key-Indexing Arthur L. Carpenter, California Occidental Consultants ABSTRACT One of the more commonly needed operations within SAS programming is to determine the value of

More information

Comp 335 File Structures. Hashing

Comp 335 File Structures. Hashing Comp 335 File Structures Hashing What is Hashing? A process used with record files that will try to achieve O(1) (i.e. constant) access to a record s location in the file. An algorithm, called a hash function

More information

DATA STRUCTURES/UNIT 3

DATA STRUCTURES/UNIT 3 UNIT III SORTING AND SEARCHING 9 General Background Exchange sorts Selection and Tree Sorting Insertion Sorts Merge and Radix Sorts Basic Search Techniques Tree Searching General Search Trees- Hashing.

More information

5. Hashing. 5.1 General Idea. 5.2 Hash Function. 5.3 Separate Chaining. 5.4 Open Addressing. 5.5 Rehashing. 5.6 Extendible Hashing. 5.

5. Hashing. 5.1 General Idea. 5.2 Hash Function. 5.3 Separate Chaining. 5.4 Open Addressing. 5.5 Rehashing. 5.6 Extendible Hashing. 5. 5. Hashing 5.1 General Idea 5.2 Hash Function 5.3 Separate Chaining 5.4 Open Addressing 5.5 Rehashing 5.6 Extendible Hashing Malek Mouhoub, CS340 Fall 2004 1 5. Hashing Sequential access : O(n). Binary

More information

Introduction. hashing performs basic operations, such as insertion, better than other ADTs we ve seen so far

Introduction. hashing performs basic operations, such as insertion, better than other ADTs we ve seen so far Chapter 5 Hashing 2 Introduction hashing performs basic operations, such as insertion, deletion, and finds in average time better than other ADTs we ve seen so far 3 Hashing a hash table is merely an hashing

More information

Lecture 16. Reading: Weiss Ch. 5 CSE 100, UCSD: LEC 16. Page 1 of 40

Lecture 16. Reading: Weiss Ch. 5 CSE 100, UCSD: LEC 16. Page 1 of 40 Lecture 16 Hashing Hash table and hash function design Hash functions for integers and strings Collision resolution strategies: linear probing, double hashing, random hashing, separate chaining Hash table

More information

Hash-Based Indexes. Chapter 11. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Hash-Based Indexes. Chapter 11. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Hash-Based Indexes Chapter Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Introduction As for any index, 3 alternatives for data entries k*: Data record with key value k

More information

Database Technology. Topic 7: Data Structures for Databases. Olaf Hartig.

Database Technology. Topic 7: Data Structures for Databases. Olaf Hartig. Topic 7: Data Structures for Databases Olaf Hartig olaf.hartig@liu.se Database System 2 Storage Hierarchy Traditional Storage Hierarchy CPU Cache memory Main memory Primary storage Disk Tape Secondary

More information

Interleaving a Dataset with Itself: How and Why

Interleaving a Dataset with Itself: How and Why cc002 Interleaving a Dataset with Itself: How and Why Howard Schreier, U.S. Dept. of Commerce, Washington DC ABSTRACT When two or more SAS datasets are combined by means of a SET statement and an accompanying

More information

Memory hierarchy review. ECE 154B Dmitri Strukov

Memory hierarchy review. ECE 154B Dmitri Strukov Memory hierarchy review ECE 154B Dmitri Strukov Outline Cache motivation Cache basics Six basic optimizations Virtual memory Cache performance Opteron example Processor-DRAM gap in latency Q1. How to deal

More information

Hashing for searching

Hashing for searching Hashing for searching Consider searching a database of records on a given key. There are three standard techniques: Searching sequentially start at the first record and look at each record in turn until

More information

CSE 562 Database Systems

CSE 562 Database Systems Goal of Indexing CSE 562 Database Systems Indexing Some slides are based or modified from originals by Database Systems: The Complete Book, Pearson Prentice Hall 2 nd Edition 08 Garcia-Molina, Ullman,

More information

Hash Table and Hashing

Hash Table and Hashing Hash Table and Hashing The tree structures discussed so far assume that we can only work with the input keys by comparing them. No other operation is considered. In practice, it is often true that an input

More information

Lecturer 4: File Handling

Lecturer 4: File Handling Lecturer 4: File Handling File Handling The logical and physical organisation of files. Serial and sequential file handling methods. Direct and index sequential files. Creating, reading, writing and deleting

More information

The physical database. Contents - physical database design DATABASE DESIGN I - 1DL300. Introduction to Physical Database Design

The physical database. Contents - physical database design DATABASE DESIGN I - 1DL300. Introduction to Physical Database Design DATABASE DESIGN I - 1DL300 Fall 2011 Introduction to Physical Database Design Elmasri/Navathe ch 16 and 17 Padron-McCarthy/Risch ch 21 and 22 An introductory course on database systems http://www.it.uu.se/edu/course/homepage/dbastekn/ht11

More information

4 Hash-Based Indexing

4 Hash-Based Indexing 4 Hash-Based Indexing We now turn to a different family of index structures: hash indexes. Hash indexes are unbeatable when it comes to equality selections, e.g. SELECT FROM WHERE R A = k. If we carefully

More information

Guidelines for Coding of SAS Programs Thomas J. Winn, Jr. Texas State Auditor s Office

Guidelines for Coding of SAS Programs Thomas J. Winn, Jr. Texas State Auditor s Office Guidelines for Coding of SAS Programs Thomas J. Winn, Jr. Texas State Auditor s Office Abstract This paper presents a set of proposed guidelines that could be used for writing SAS code that is clear, efficient,

More information

A Comparison of Table Lookup Techniques. Craig Ray, ORI, Inc.

A Comparison of Table Lookup Techniques. Craig Ray, ORI, Inc. A Comparison of Table Lookup Techniques Craig Ray, ORI, Inc. 1 INTRODUCTION This paper presents table lookup procedures and shows how to use them efficiently with respect to CPU time. Table lookup can

More information

CPE300: Digital System Architecture and Design

CPE300: Digital System Architecture and Design CPE300: Digital System Architecture and Design Fall 2011 MW 17:30-18:45 CBC C316 Virtual Memory 11282011 http://www.egr.unlv.edu/~b1morris/cpe300/ 2 Outline Review Cache Virtual Memory Projects 3 Memory

More information

UNIT III BALANCED SEARCH TREES AND INDEXING

UNIT III BALANCED SEARCH TREES AND INDEXING UNIT III BALANCED SEARCH TREES AND INDEXING OBJECTIVE The implementation of hash tables is frequently called hashing. Hashing is a technique used for performing insertions, deletions and finds in constant

More information

CSC 261/461 Database Systems Lecture 17. Fall 2017

CSC 261/461 Database Systems Lecture 17. Fall 2017 CSC 261/461 Database Systems Lecture 17 Fall 2017 Announcement Quiz 6 Due: Tonight at 11:59 pm Project 1 Milepost 3 Due: Nov 10 Project 2 Part 2 (Optional) Due: Nov 15 The IO Model & External Sorting Today

More information

The A ssembly Assembly Language Level Chapter 7 1

The A ssembly Assembly Language Level Chapter 7 1 The Assembly Language Level Chapter 7 1 Contemporary Multilevel Machines A six-level l computer. The support method for each level is indicated below it.2 Assembly Language Level a) It is implemented by

More information

VISUAL BASIC COLLECTIONS

VISUAL BASIC COLLECTIONS VISUAL BASIC COLLECTIONS AND HASH TABLES Tom Niemann Preface Hash tables offer a method for quickly storing and accessing data based on a key value. When you access a Visual Basic collection using a key,

More information

CSE 530A. B+ Trees. Washington University Fall 2013

CSE 530A. B+ Trees. Washington University Fall 2013 CSE 530A B+ Trees Washington University Fall 2013 B Trees A B tree is an ordered (non-binary) tree where the internal nodes can have a varying number of child nodes (within some range) B Trees When a key

More information

MACRO REFERENCING ENVIRONMENTS

MACRO REFERENCING ENVIRONMENTS MACRO REFERENCING ENVIRONMENTS Sandra Hendren, Independent Consultant Here are some examp~es showing g~obal and local macro variables w~th the same names that are in effect at the same time. Using same-named

More information

Chapter 1. Introduction to Indexes

Chapter 1. Introduction to Indexes Chapter 1 Introduction to Indexes The Index Concept 2 The Index as a SAS Performance Tool 2 Types of SAS Applications That May Benefit from Indexes 4 How SAS Indexes Are Structured 4 Types of SAS Indexes

More information

Table Lookups: Getting Started With Proc Format

Table Lookups: Getting Started With Proc Format Table Lookups: Getting Started With Proc Format John Cohen, AstraZeneca LP, Wilmington, DE ABSTRACT Table lookups are among the coolest tricks you can add to your SAS toolkit. Unfortunately, these techniques

More information

COMP171. Hashing.

COMP171. Hashing. COMP171 Hashing Hashing 2 Hashing Again, a (dynamic) set of elements in which we do search, insert, and delete Linear ones: lists, stacks, queues, Nonlinear ones: trees, graphs (relations between elements

More information

More on Hashing: Collisions. See Chapter 20 of the text.

More on Hashing: Collisions. See Chapter 20 of the text. More on Hashing: Collisions See Chapter 20 of the text. Collisions Let's do an example -- add some people to a hash table of size 7. Name h = hash(name) h%7 Ben 66667 6 Bob 66965 3 Steven -1808493797-5

More information

File Management. Logical Structure. Positioning Fields and Records. Primary and Secondary Keys. Sequential and Direct Access.

File Management. Logical Structure. Positioning Fields and Records. Primary and Secondary Keys. Sequential and Direct Access. File Management Logical Structure Positioning Fields and Records Primary and Secondary Keys Sequential and Direct Access Binary Search File Management File Indexing Chapter 2 Logical Structure File, on

More information

Module 5: Hash-Based Indexing

Module 5: Hash-Based Indexing Module 5: Hash-Based Indexing Module Outline 5.1 General Remarks on Hashing 5. Static Hashing 5.3 Extendible Hashing 5.4 Linear Hashing Web Forms Transaction Manager Lock Manager Plan Executor Operator

More information

Tables. The Table ADT is used when information needs to be stored and acessed via a key usually, but not always, a string. For example: Dictionaries

Tables. The Table ADT is used when information needs to be stored and acessed via a key usually, but not always, a string. For example: Dictionaries 1: Tables Tables The Table ADT is used when information needs to be stored and acessed via a key usually, but not always, a string. For example: Dictionaries Symbol Tables Associative Arrays (eg in awk,

More information

Chapter 3: Process Concept

Chapter 3: Process Concept Chapter 3: Process Concept Chapter 3: Process Concept Process Concept Process Scheduling Operations on Processes Inter-Process Communication (IPC) Communication in Client-Server Systems Objectives 3.2

More information

CSE 332: Data Structures & Parallelism Lecture 12: Comparison Sorting. Ruth Anderson Winter 2019

CSE 332: Data Structures & Parallelism Lecture 12: Comparison Sorting. Ruth Anderson Winter 2019 CSE 332: Data Structures & Parallelism Lecture 12: Comparison Sorting Ruth Anderson Winter 2019 Today Sorting Comparison sorting 2/08/2019 2 Introduction to sorting Stacks, queues, priority queues, and

More information

Chapter 3: Process Concept

Chapter 3: Process Concept Chapter 3: Process Concept Chapter 3: Process Concept Process Concept Process Scheduling Operations on Processes Inter-Process Communication (IPC) Communication in Client-Server Systems Objectives 3.2

More information

File System Interface: Overview. Objective. File Concept UNIT-IV FILE SYSTEMS

File System Interface: Overview. Objective. File Concept UNIT-IV FILE SYSTEMS UNIT-IV FILE SYSTEMS File System Interface: File Concept Access Methods Directory Structure File System Mounting Protection Overview For most users, the file system is the most visible aspect of an operating

More information

Programming Beyond the Basics. Find() the power of Hash - How, Why and When to use the SAS Hash Object John Blackwell

Programming Beyond the Basics. Find() the power of Hash - How, Why and When to use the SAS Hash Object John Blackwell Find() the power of Hash - How, Why and When to use the SAS Hash Object John Blackwell ABSTRACT The SAS hash object has come of age in SAS 9.2, giving the SAS programmer the ability to quickly do things

More information

Paper CC16. William E Benjamin Jr, Owl Computer Consultancy LLC, Phoenix, AZ

Paper CC16. William E Benjamin Jr, Owl Computer Consultancy LLC, Phoenix, AZ Paper CC16 Smoke and Mirrors!!! Come See How the _INFILE_ Automatic Variable and SHAREBUFFERS Infile Option Can Speed Up Your Flat File Text-Processing Throughput Speed William E Benjamin Jr, Owl Computer

More information

AAL 217: DATA STRUCTURES

AAL 217: DATA STRUCTURES Chapter # 4: Hashing AAL 217: DATA STRUCTURES The implementation of hash tables is frequently called hashing. Hashing is a technique used for performing insertions, deletions, and finds in constant average

More information

Selection Queries. to answer a selection query (ssn=10) needs to traverse a full path.

Selection Queries. to answer a selection query (ssn=10) needs to traverse a full path. Hashing B+-tree is perfect, but... Selection Queries to answer a selection query (ssn=) needs to traverse a full path. In practice, 3-4 block accesses (depending on the height of the tree, buffering) Any

More information

Chapter 3: Process Concept

Chapter 3: Process Concept Chapter 3: Process Concept Silberschatz, Galvin and Gagne 2013! Chapter 3: Process Concept Process Concept" Process Scheduling" Operations on Processes" Inter-Process Communication (IPC)" Communication

More information

Chapter 17. Disk Storage, Basic File Structures, and Hashing. Records. Blocking

Chapter 17. Disk Storage, Basic File Structures, and Hashing. Records. Blocking Chapter 17 Disk Storage, Basic File Structures, and Hashing Records Fixed and variable length records Records contain fields which have values of a particular type (e.g., amount, date, time, age) Fields

More information

ECE468 Computer Organization and Architecture. Virtual Memory

ECE468 Computer Organization and Architecture. Virtual Memory ECE468 Computer Organization and Architecture Virtual Memory ECE468 vm.1 Review: The Principle of Locality Probability of reference 0 Address Space 2 The Principle of Locality: Program access a relatively

More information

Chapter 7 Sorting. Terminology. Selection Sort

Chapter 7 Sorting. Terminology. Selection Sort Chapter 7 Sorting Terminology Internal done totally in main memory. External uses auxiliary storage (disk). Stable retains original order if keys are the same. Oblivious performs the same amount of work

More information

CS Computer Architecture

CS Computer Architecture CS 35101 Computer Architecture Section 600 Dr. Angela Guercio Fall 2010 An Example Implementation In principle, we could describe the control store in binary, 36 bits per word. We will use a simple symbolic

More information

CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2

CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2 CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2 Prof. John Park Based on slides from previous iterations of this course Today s Topics Overview Uses and motivations of hash tables Major concerns with hash

More information

ECE4680 Computer Organization and Architecture. Virtual Memory

ECE4680 Computer Organization and Architecture. Virtual Memory ECE468 Computer Organization and Architecture Virtual Memory If I can see it and I can touch it, it s real. If I can t see it but I can touch it, it s invisible. If I can see it but I can t touch it, it

More information

Data Structure. IBPS SO (IT- Officer) Exam 2017

Data Structure. IBPS SO (IT- Officer) Exam 2017 Data Structure IBPS SO (IT- Officer) Exam 2017 Data Structure: In computer science, a data structure is a way of storing and organizing data in a computer s memory so that it can be used efficiently. Data

More information

Efficient Processing of Long Lists of Variable Names

Efficient Processing of Long Lists of Variable Names Efficient Processing of Long Lists of Variable Names Paulette W. Staum, Paul Waldron Consulting, West Nyack, NY ABSTRACT Many programmers use SAS macro language to manipulate lists of variable names. They

More information

How to Incorporate Old SAS Data into a New DATA Step, or What is S-M-U?

How to Incorporate Old SAS Data into a New DATA Step, or What is S-M-U? Paper 54-25 How to Incorporate Old SAS Data into a New DATA Step, or What is S-M-U? Andrew T. Kuligowski Nielsen Media Research Abstract / Introduction S-M-U. Some people will see these three letters and

More information

Data Organization B trees

Data Organization B trees Data Organization B trees Data organization and retrieval File organization can improve data retrieval time SELECT * FROM depositors WHERE bname= Downtown 100 blocks 200 recs/block Query returns 150 records

More information

General Objective:To understand the basic memory management of operating system. Specific Objectives: At the end of the unit you should be able to:

General Objective:To understand the basic memory management of operating system. Specific Objectives: At the end of the unit you should be able to: F2007/Unit6/1 UNIT 6 OBJECTIVES General Objective:To understand the basic memory management of operating system Specific Objectives: At the end of the unit you should be able to: define the memory management

More information

CSIT5300: Advanced Database Systems

CSIT5300: Advanced Database Systems CSIT5300: Advanced Database Systems L08: B + -trees and Dynamic Hashing Dr. Kenneth LEUNG Department of Computer Science and Engineering The Hong Kong University of Science and Technology Hong Kong SAR,

More information

Hash Tables COS 217 1

Hash Tables COS 217 1 Hash Tables COS 217 1 Goals of Today s Lecture Motivation for hash tables o Examples of (key, value) pairs o Limitations of using arrays o Example using a linked list o Inefficiency of using a linked list

More information

Lecture 8 Index (B+-Tree and Hash)

Lecture 8 Index (B+-Tree and Hash) CompSci 516 Data Intensive Computing Systems Lecture 8 Index (B+-Tree and Hash) Instructor: Sudeepa Roy Duke CS, Fall 2017 CompSci 516: Database Systems 1 HW1 due tomorrow: Announcements Due on 09/21 (Thurs),

More information

Locking SAS Data Objects

Locking SAS Data Objects 59 CHAPTER 5 Locking SAS Data Objects Introduction 59 Audience 60 About the SAS Data Hierarchy and Locking 60 The SAS Data Hierarchy 60 How SAS Data Objects Are Accessed and Used 61 Types of Locks 62 Locking

More information

Base and Advance SAS

Base and Advance SAS Base and Advance SAS BASE SAS INTRODUCTION An Overview of the SAS System SAS Tasks Output produced by the SAS System SAS Tools (SAS Program - Data step and Proc step) A sample SAS program Exploring SAS

More information

Summarizing Impossibly Large SAS Data Sets For the Data Warehouse Server Using Horizontal Summarization

Summarizing Impossibly Large SAS Data Sets For the Data Warehouse Server Using Horizontal Summarization Summarizing Impossibly Large SAS Data Sets For the Data Warehouse Server Using Horizontal Summarization Michael A. Raithel, Raithel Consulting Services Abstract Data warehouse applications thrive on pre-summarized

More information

SD10 A SAS MACRO FOR PERFORMING BACKWARD SELECTION IN PROC SURVEYREG

SD10 A SAS MACRO FOR PERFORMING BACKWARD SELECTION IN PROC SURVEYREG Paper SD10 A SAS MACRO FOR PERFORMING BACKWARD SELECTION IN PROC SURVEYREG Qixuan Chen, University of Michigan, Ann Arbor, MI Brenda Gillespie, University of Michigan, Ann Arbor, MI ABSTRACT This paper

More information

Chapter 5: Physical Database Design. Designing Physical Files

Chapter 5: Physical Database Design. Designing Physical Files Chapter 5: Physical Database Design Designing Physical Files Technique for physically arranging records of a file on secondary storage File Organizations Sequential (Fig. 5-7a): the most efficient with

More information

Volume II, Section 5 Table of Contents

Volume II, Section 5 Table of Contents Volume II, Section 5 Table of Contents 5...5-1 5.1 Scope...5-1 5.2 Basis of...5-1 5.3 Initial Review of Documentation...5-2 5.4 Source Code Review...5-2 5.4.1 Control Constructs...5-3 5.4.1.1 Replacement

More information

using and Understanding Formats

using and Understanding Formats using and Understanding SAS@ Formats Howard Levine, DynaMark, Inc. Oblectives The purpose of this paper is to enable you to use SAS formats to perform the following tasks more effectively: Improving the

More information

Chapter 5 Hashing. Introduction. Hashing. Hashing Functions. hashing performs basic operations, such as insertion,

Chapter 5 Hashing. Introduction. Hashing. Hashing Functions. hashing performs basic operations, such as insertion, Introduction Chapter 5 Hashing hashing performs basic operations, such as insertion, deletion, and finds in average time 2 Hashing a hash table is merely an of some fixed size hashing converts into locations

More information

Efficiency. Efficiency: Indexing. Indexing. Efficiency Techniques. Inverted Index. Inverted Index (COSC 488)

Efficiency. Efficiency: Indexing. Indexing. Efficiency Techniques. Inverted Index. Inverted Index (COSC 488) Efficiency Efficiency: Indexing (COSC 488) Nazli Goharian nazli@cs.georgetown.edu Difficult to analyze sequential IR algorithms: data and query dependency (query selectivity). O(q(cf max )) -- high estimate-

More information

The Building Blocks of SAS Datasets. (Set, Merge, and Update) Andrew T. Kuligowski FCCI Insurance Group

The Building Blocks of SAS Datasets. (Set, Merge, and Update) Andrew T. Kuligowski FCCI Insurance Group The Building Blocks of SAS Datasets S-M-U (Set, Merge, and Update) Andrew T. Kuligowski FCCI Insurance Group S-M-U What is S M U? 2 S-M-U What is S M U? Shmoo? 3 S-M-U What is S M U? Southern Methodist

More information

Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA

Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA Indexing and Compressing SAS Data Sets: How, Why, and Why Not Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA Many users of SAS System software, especially those working

More information

Announcements. Reading Material. Recap. Today 9/17/17. Storage (contd. from Lecture 6)

Announcements. Reading Material. Recap. Today 9/17/17. Storage (contd. from Lecture 6) CompSci 16 Intensive Computing Systems Lecture 7 Storage and Index Instructor: Sudeepa Roy Announcements HW1 deadline this week: Due on 09/21 (Thurs), 11: pm, no late days Project proposal deadline: Preliminary

More information

CS122 Lecture 3 Winter Term,

CS122 Lecture 3 Winter Term, CS122 Lecture 3 Winter Term, 2017-2018 2 Record-Level File Organization Last time, finished discussing block-level organization Can also organize data files at the record-level Heap file organization A

More information

Introduction. Getting Started with the Macro Facility CHAPTER 1

Introduction. Getting Started with the Macro Facility CHAPTER 1 1 CHAPTER 1 Introduction Getting Started with the Macro Facility 1 Replacing Text Strings Using Macro Variables 2 Generating SAS Code Using Macros 3 Inserting Comments in Macros 4 Macro Definition Containing

More information

Database Management Systems Written Examination

Database Management Systems Written Examination Database Management Systems Written Examination 14.02.2007 First name Student number Last name Signature Instructions for Students Write your name, student number, and signature on the exam sheet. Write

More information

Lessons Learned in Adapting a Software System to a Micro Computer

Lessons Learned in Adapting a Software System to a Micro Computer Lessons Learned in Adapting a Software System to a Micro Computer ABSTRACT: A system was developed in a laboratory on a desktop computer to evaluate armor health. The system uses sensors embedded in the

More information

Memory Management. q Basic memory management q Swapping q Kernel memory allocation q Next Time: Virtual memory

Memory Management. q Basic memory management q Swapping q Kernel memory allocation q Next Time: Virtual memory Memory Management q Basic memory management q Swapping q Kernel memory allocation q Next Time: Virtual memory Memory management Ideal memory for a programmer large, fast, nonvolatile and cheap not an option

More information

Provided by - Microsoft Placement Paper Technical 2012

Provided by   - Microsoft Placement Paper Technical 2012 Provided by www.yuvajobs.com - Microsoft Placement Paper Technical 2012 1. Analytical 25 questions ( 30 minutes) 2. Reasoning 25 questions (25 minutes) 3. Verbal 20 questions (20 minutes) Analytical (some

More information

FSEDIT Procedure Windows

FSEDIT Procedure Windows 25 CHAPTER 4 FSEDIT Procedure Windows Overview 26 Viewing and Editing Observations 26 How the Control Level Affects Editing 27 Scrolling 28 Adding Observations 28 Entering and Editing Variable Values 28

More information

Hash Tables. CS 311 Data Structures and Algorithms Lecture Slides. Wednesday, April 22, Glenn G. Chappell

Hash Tables. CS 311 Data Structures and Algorithms Lecture Slides. Wednesday, April 22, Glenn G. Chappell Hash Tables CS 311 Data Structures and Algorithms Lecture Slides Wednesday, April 22, 2009 Glenn G. Chappell Department of Computer Science University of Alaska Fairbanks CHAPPELLG@member.ams.org 2005

More information

PROCESSING LARGE SAS AND DB2 Fll..ES: CLOSE ENCOUNTERS OF THE COLOSSAL KIND

PROCESSING LARGE SAS AND DB2 Fll..ES: CLOSE ENCOUNTERS OF THE COLOSSAL KIND PROCESSING LARGE SAS AND DB2 Fll..ES: CLOSE ENCOUNTERS OF THE COLOSSAL KIND Judy Loren, ASG, Inc. Alan Dickson, ASG, Inc. Introduction Over the last few years, a number of papers have been presented at

More information

IS 709/809: Computational Methods in IS Research. Algorithm Analysis (Sorting)

IS 709/809: Computational Methods in IS Research. Algorithm Analysis (Sorting) IS 709/809: Computational Methods in IS Research Algorithm Analysis (Sorting) Nirmalya Roy Department of Information Systems University of Maryland Baltimore County www.umbc.edu Sorting Problem Given an

More information

File Structures and Indexing

File Structures and Indexing File Structures and Indexing CPS352: Database Systems Simon Miner Gordon College Last Revised: 10/11/12 Agenda Check-in Database File Structures Indexing Database Design Tips Check-in Database File Structures

More information

CS-245 Database System Principles

CS-245 Database System Principles CS-245 Database System Principles Midterm Exam Summer 2001 SOLUIONS his exam is open book and notes. here are a total of 110 points. You have 110 minutes to complete it. Print your name: he Honor Code

More information

Algorithms in Systems Engineering ISE 172. Lecture 12. Dr. Ted Ralphs

Algorithms in Systems Engineering ISE 172. Lecture 12. Dr. Ted Ralphs Algorithms in Systems Engineering ISE 172 Lecture 12 Dr. Ted Ralphs ISE 172 Lecture 12 1 References for Today s Lecture Required reading Chapter 5 References CLRS Chapter 11 D.E. Knuth, The Art of Computer

More information

COSC160: Data Structures Hashing Structures. Jeremy Bolton, PhD Assistant Teaching Professor

COSC160: Data Structures Hashing Structures. Jeremy Bolton, PhD Assistant Teaching Professor COSC160: Data Structures Hashing Structures Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Hashing Structures I. Motivation and Review II. Hash Functions III. HashTables I. Implementations

More information

CARNEGIE MELLON UNIVERSITY DEPT. OF COMPUTER SCIENCE DATABASE APPLICATIONS

CARNEGIE MELLON UNIVERSITY DEPT. OF COMPUTER SCIENCE DATABASE APPLICATIONS CARNEGIE MELLON UNIVERSITY DEPT. OF COMPUTER SCIENCE 15-415 DATABASE APPLICATIONS C. Faloutsos Indexing and Hashing 15-415 Database Applications http://www.cs.cmu.edu/~christos/courses/dbms.s00/ general

More information