A Performance Study of Hashing Functions for. M. V. Ramakrishna, E. Fu and E. Bahcekapili. Michigan State University. East Lansing, MI

Size: px

Start display at page:

Download "A Performance Study of Hashing Functions for. M. V. Ramakrishna, E. Fu and E. Bahcekapili. Michigan State University. East Lansing, MI"

Darcy Fields
5 years ago
Views:

1 A Performance Study of Hashing Functions for Hardware Applications M. V. Ramakrishna, E. Fu and E. Bahcekapili Department of Computer Science Michigan State University East Lansing, MI frama, fue, Abstract Hashing is used extensively in hardware applications such as page tables for address translation. There is not much literature in this regard although hashing has been extensively studied for le organization. More specically, there is no study of the practical performance of hashing functions used. In the literature we nd bit extraction and exclusive ORing hashing functions used, but there is no mention of the performance of these functions. Moreover, the performance of the hashing functions in relation to the theoretical performance of hashing schemes is not addressed. In this paper we study the practical performance of a particular class of hashing functions. Our results show that by choosing functions randomly from this class of hashing functions, which can be readily implemented in hardware, we can achieve analytically predicted performance of hashing schemes with real life data. Proc. ICCI 94, Int. Conf. on Computing and Information

2 1 Introduction Hashing is a widely used technique of organizing tables which also nds several applications in hardware. For example, hash tables are used to implement page tables in many modern architectures [1] such as IBM system/38 [2], Monads II[3], etc. Thakkar and Knowles proposed a method of address translation, using parallel hashing hardware [4]. Although hashing is widely used in hardware, there is not much literature in this regard and specially we have not been able to nd any paper dealing with the performance of hashing functions suitable for hardware implementation. In all the hardware applications of hashing we have encountered, hash address is obtained by simple bit extraction or exclusive ORing of bit segments. There is no report of the performance of any of the hashing functions. This is hardly surprising, since even for software application there is not much literature on the performance of hashing functions [5],[6]. In this paper, we study the performance of a particular class of hashing functions which can be readily implemented in hardware. This class of hashing functions were shown to be universal 2 by Carter and Wegman [7]. We show that choosing functions at random from this class of functions gives exactly the theoretically predicted performance of hashing. We provide comparison of search lengths for two dierent hashing schemes. We assume that the reader is familiar with the basics of hashing and the related terminology [5], [8]. Knuth expressed his fears of using hashing in[8, p. 540] by concluding his chapter on hashing with: \Finally, we need a great deal of faith in probability theory when we use hashing methods, since they are ecient only on the average, while their worst case is terrible! As in the case of random number generators, we are never completely sure that a hash function will perform properly when it is applied to a new set of data. 1622

3 Therefore scatter storage would be inappropriate for certain real-time applications such as air trac control, where people's lives are at stake". Later in 1981, Gonnet showed that such fears were baseless since the probability of worst case is ridiculously small[9]. He proposed that the expected length of the longest probe sequence (rather than the possible worst of the longest probe sequence) should be the measure of \worst case" of hashing. The length of the longest probe sequence, abbreviated as llps and expected length of the longest probe sequence, E(llps), can be explained as follows. One key, amongst all the keys hashed, has the maximum search length. This search length is the llps. The average value of llps over a large number of dierent hash tables (all tables with the same parameters) is the expected llps, E(llps). This E(llps) is much smaller than the O(n) worst case. Our results about the expected length of the longest probe sequence (and the narrow distribution of the length of the longest probe sequence) which agree closely with the analytical performance measures of Gonnet[9] and Larson[10] are most signicant which show that the \worst case" fears of hashing are baseless. The rest of the paper is organized as follows. In the next section we provide the background of hashing in hardware. Section 3 introduces the class of hashing functions H 3 and present the results of performance study. The last section provides conclusions and discussion of future work. 2 Background We give an overview of the hardware applications of hashing. Irrelevant details of the hardware are omitted in many cases for brevity. Braidt and Taylor describe a memory system which has two independent addressable cards[11]. There are circuits which enable refresh of one card while normal memory access is made to the other card. A simple hash circuit is used to cause 1623

4 memory addresses to occur randomly between the two memory cards. Benhase describes use of hash circuits and directory for data access from a storage hierarchy [12]. A similar \Hash and chain" technique implemented in hardware to address a cache on Direct-access storage devices was used by Robinson and Taylor [13]. McKenney used \dictionary hash" technique to quickly determine stochastically the number of unique labels in a network when presented with a large group of labeled events[14]. The label consists of the source and the destination addresses. At the beginning of each measurement period, a global sequence number is incremented. When a new label comes, this sequence number is inserted into the corresponding hash location. If a label is repeated there will be a match between the number retrieved from the hash address and the global sequence number. The hardware implementation of the scheme shown in their paper uses three distinct Hardware Hash units. The paper does not mention details of the hashing functions but just that \the possible candidates for the hashing functions include CRCs, checksums : : : ". Nor does the paper discuss how to satisfy the requirement that those hash functions produced \statistically independent" hash addresses. The hashing functions we study in the next section readily achieve this exact requirement. There are several papers which discuss about address translation hardware for virtual memory implementation. Houdek and Mitchell discuss the address translation scheme used by the IBM system/38 [2]. The virtual page addresses are hashed into a page directory table. When a match is found the corresponding physical page number is found. Cocke and Worley described the hashing technique used: bit selection (which involves selection of particular bits) from virtual page address [15]. No details of the performance of the hashing functions are provided (nor do Ramamohanarao and Sacks-Davis mention about any particular hashing function[16]). Thakkar and Knowles propose a parallel hashing hardware scheme for address translation [4]. Parallel hardware is used to search a whole bucket in parallel. They 1624

5 suggested hash-bit extraction for the hashing function and quadratic probing for collision resolution. No research about the performance of the hashing functions and the probing functions are mentioned. Brandit proposed an extendible hashing scheme for line-oriented paging stores [17]. Chaining was used for collision resolution. They did not discuss anything about the hashing function used or its performance. Ida and Goto studied the performance of the parallel hashing address translation scheme with key deletion[18]. They used uniform hashing to resolve collision. They also did not discuss about the performance of any particular hashing function. In the next section we introduce a class of hashing functions, called H 3 by Carter and Wegman [7]. We show that by choosing functions randomly from this class of hashing functions, the theoretically predicted performance of hashing schemes can be achieved in practice on real life data set. These hashing functions can be used with any hashing scheme and can be readily implemented in hardware. 3 Performance of a class of hashing functions Let A = f0; 1; 2; :::; a? 1g be the key space and B = f0; 1; :::; m? 1g the address space. Let I be the given key set, I = fx 1 ; x 2 ; :::; x n g, I A. There are a total of m n possible functions from I to B. The usual assumption used in the papers (almost all) analyzing the performance of hashing schemes corresponds to the expected performance value over all the m n possible mapping/hashing functions (The assumption is usually stated as that the probability of a key hashing to a particular location is 1=m, independent of the outcome of the other keys). Our hypothesis is that by choosing functions at random from the class H 3 the analytically predicted performance can be achieved in practice on real life les. 1625

6 The class of functions H 3 We redene A and B so that their cardinalities are powers of 2: Let A = f0; 1; : : : ; 2 i? 1g; and B = f0; 1; : : : ; 2 j? 1g Here i is the number of bits in the key and j is the number of bits in the address. The class H 3 is dened as follows: Let Q denote the set of all i j boolean matrices. For a given q 2 Q and x 2 A, let q(k) be the kth row of the matrix q and x k the kth bit of x. The hashing function h q (x) : A! B is dened as h q (x) = x 1 q(1) x 2 q(2) : : : x i q(i): where denotes the binary AND operation and the exclusive OR operation. The class H 3 is the set fh q j q 2 Qg. The following example illustrates the hashing functions and hash address calculations. Example: 1626

7 Let i be 8 and j be 3. Then the address space is A = f0; : : : ; 255g and the key space is B = f0; : : : ; 7g. We randomly choose an 8 3 matrix q: 2 q = : 7 5 Then the hash addresses for keys 53 and 100 are h q (53) = h q ( ) = q(3) q(4) q(6) q(8) = = 110 = 6(decimal): h q (100) = h q ( ) = q(2) q(3) q(6) = = 0100 = 2(decimal): This class of hashing functions is universal 2 [7]. A class H of hashing functions is said to be universal 2 if no pair of keys collide under more than jhj=m of the functions in the class. Here jhj is the number of hashing functions in H and m is the size of the address space. Hashing functions from this class can be easily implemented in hardware. The following gure shows a circuit implementation. When presented with the key x 1 x 2 x 3 the hash address a 1 a 2 is the output. The 1627

8 matrix q can be generated in software and then loaded into the bank of registers. The circuit is self explanatory and we will not elaborate further. x 1 x 2 x 3 q 1,1 q 1,2 q 2,1 q 2,2 a 1 q 3,1 q 3,2 a 2 Figure 1. Hash address generator, key=x 1 x 2 x 3, hash matrix elements= q ij, hash address=a 1 a

9 Experiments Our hypothesis is that by choosing hashing functions at random from the class H 3, the analytical performance of hashing schemes can be achieved in practice on real life les. We can write a lengthy explanation (justication) for this, but we feel that is irrelevant for this paper [5], [7]. In order to verify this hypothesis, we conducted a series of experiments on real life data sets. Obviously we do not need to build any hardware to experiment. We used uniform hashing and separate chaining collision resolution schemes. These are typical and simple schemes. The obvious implication is that if certain functions perform according to analytical predictions on these schemes, they will do so for any other hashing scheme. Each set of experiments was performed as follows: The hash table size, bucket size and the load factor are xed. Number of keys corresponding to the load factor are selected from the test data consisting of 32 bit integers. A hashing function is generated by generating the requisite number of rows of random numbers. All the keys are then hashed using the hashing function. The search lengths are computed. The same is repeated for 500 dierent hashing functions. Tables 1 and 2 list the average successful and unsuccessful search lengths for separate chaining. The table length was 1024 buckets. The keys were from a le of user-ids from a computer system. We see that the experimental and analytical results agree very closely (The analytical results are from [5], [8] and [10]). The agreement is so close that we do not think statistical tests are necessary. Similar results were obtained for other test les. In each experiment, one of the keys has the longest search length. The corresponding value is noted. The average value of the length of the longest probe sequence over all the experiments is listed in table 3. We see that here again experimental values agree closely with the analytically predicted values. Tables 4-6 show similar results for uniform hashing. For llps, the analytical results given in [10] are 1629

10 values when the number of keys hashed is xed at 1000, and bucket size and load factor change. The experimental results for llps are obtained accordingly for table 6. Figure 2 plots the probability distribution of the length of the longest probe sequence for b = 10; m = 1024, and load factor = 0.8 for the double hashing scheme. We see that the llps is narrowly distributed between 4 and 11 with peak occurring at 6 with E(llps) being We see that out of 500 experiments none has a search length greater than 16 (a quite small value as compared to the worst case which is 1024). Only one value of llps is 16 and there are none in the range of In view of Knuth's statement about worst case of hashing, these results are the most signicant of all. This is showing, as Gonnet predicted, the probability of the worst case of hashing occurring is ridiculously small, and that the llps is narrowly distributed [9]. The value of E(llps) itself is quite small relatively and the probability of llps being much higher than E(llps) is very small. 4 Conclusions There is no literature/description of the performance of hashing functions suitable for hardware applications. There are a number of applications for hashing in hardware. We have shown that by choosing functions at random from the class H 3, the theoretically predicted performance of hashing schemes can be achieved in practice. Also the results about the llps show that Knuth's fears (which appears to be the reason that led Thakkar and Knowles to infer \ : : : requires the storage of about 1000 pseudo random numbers into PROM" [4]) are not justied. These functions can be used in any of the applications discussed in section 2. We are investigating further about hash tables for hardware. 1630

11 b load=0.6 load=0.7 load=0.8 load=0.9 exptl anal exptl anal exptl anal exptl anal Table 1. Expected length of successful search for Separate Chaining b load=0.6 load=0.7 load=0.8 load=0.9 exptl anal exptl anal exptl anal exptl anal Table 2. Expected length of unsuccessful search for Separate Chaining 1631

12 b load=0.6 load=0.7 load=0.8 load=0.9 exptl anal exptl anal exptl anal exptl anal Table 3. Expected llps for Separate Chaining, m = 1024 b load=0.6 load=0.7 load=0.8 load=0.9 exptl anal exptl anal exptl anal exptl anal Table 4. Expected length of successful search for Uniform hashing 1632

13 b load=0.6 load=0.7 load=0.8 load=0.9 exptl anal exptl anal exptl anal exptl anal Table 5. Expected length of unsuccessful search for Uniform hashing b load=0.6 load=0.7 load=0.8 load=0.9 exptl anal exptl anal exptl anal exptl anal Table 6. Expected llps for Uniform hashing, n =

14 "m=1024"? 0.60 Probability 0.40?? 0.20?????????? 0.00????? No. of Probes Figure 2. The prob. distribution of the length of the longest probe sequence, m=1024, load factor=0.8, b=10 (results from 500 experiments) 1634

15 References [1] A. Tanenbaum, Modern operating systems. Prentice Hall, pp , [2] M. Houdek and G. Mitchell, \Translating a large virtual address," IBM System/38 Tech. Developments, pp. 22{24, [3] D. Abramson, \Hardware management of a large virtual memory," Proc. 4th Australian Computer Science Conf., vol. 3, no. 1, [4] S. Thakkar and A. Knowles, \A high-performance memory management scheme," IEEE Computer, pp. 8{22, May [5] M. V. Ramakrishna, \Hashing in practice, analysis of hashing and universal hashing," in Proc. ACM Sigmod Conf., pp. 191{199, [6] J. Mullin, \A note on universal classes of hash functions," Information Processing Letters, vol. 37, pp. 247{256, [7] L. Carter and M. Wegman, \Universal classes of hashing functions," Journal of Computer and System Sciences, vol. 18, no. 2, pp. 143{154, [8] D. Knuth, The art of computer programming, vol. 3. Reading, MA: Addison Wesley, [9] G. Gonnet, \Expected length of the longest probe sequence in hash code searching," J. ACM, vol. 28, no. 2, pp. 289{304, [10] P. Larson, \Expected worst-case performance of hash les," The Computer Journal, vol. 25, no. 3, pp. 347{352, [11] J. Braidt and J. Taylor, \Address hashing circuit for memory with nonchangeable address block," IBM Technical Disclosure Bulletin, vol. 24, no. 7A, pp. 3531{3532,

16 [12] M. Benhase, \Resetting storage unit directories," IBM Technical Disclosure Bulletin, vol. 25, no. 7B, pp. 3760{3761, [13] H. Robinson and G. Tayler, \Hashing addresses to a cache on dasd," IBM Technical Disclosure Bulletin, vol. 24, no. 11A, pp. 5354{5356, [14] P. McKenney, \High-speed event counting and classication using a dictionary hash technique," in 1989 International Conference on Parallel Processing, pp. III{71{III{75, [15] J. Cocke and W. Worley, \Virtual to real address translation using hashing," IBM Technical Disclosure Bulletin, vol. 24, no. 6, pp. 2724{2726, [16] K. Ramamoganarao and R. Sacks-Davis, \Hardware address translation for machines with a large virtual memory," Information Processing Letters, vol. 13, no. 1, pp. 23{29, [17] R. Bryant, \Extendible hashing for line-oriented paging stores," IBM Technical Disclosure Bulletin, vol. 26, no. 11, pp. 6046{6049, [18] T. Ida and E. Goto, \Performance of parallel hash hardware with key deletion," Information Processing, vol. 77, pp. 643{647,

Worst-case running time for RANDOMIZED-SELECT

Worst-case running time for RANDOMIZED-SELECT is ), even to nd the minimum The algorithm has a linear expected running time, though, and because it is randomized, no particular input elicits the worst-case