CHAPTER 4 BLOOM FILTER

Similar documents
INTRODUCTION TO FPGA ARCHITECTURE

Automatic compilation framework for Bloom filter based intrusion detection

Bloom Filters. References:

Advanced FPGA Design Methodologies with Xilinx Vivado

PERFORMANCE ANALYSIS OF HIGH EFFICIENCY LOW DENSITY PARITY-CHECK CODE DECODER FOR LOW POWER APPLICATIONS

A Configurable Multi-Ported Register File Architecture for Soft Processor Cores

TSEA44 - Design for FPGAs

FPGA architecture and design technology

CHAPTER 5. CHE BASED SoPC FOR EVOLVABLE HARDWARE

International Journal of Advanced Research in Computer Science and Software Engineering

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS /$ IEEE

IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY FPGA

Background on Bloom Filter

Error Detection and Correction by using Bloom Filters R. Prem Kumar, Smt. V. Annapurna

AES Core Specification. Author: Homer Hsing

The Next Generation 65-nm FPGA. Steve Douglass, Kees Vissers, Peter Alfke Xilinx August 21, 2006

High Speed Special Function Unit for Graphics Processing Unit

A Robust Bloom Filter

Field Programmable Gate Array

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 3. Arithmetic for Computers Implementation

PERG-Rx: An FPGA-based Pattern-Matching Engine with Limited Regular Expression Support for Large Pattern Database. Johnny Ho

A Novel Design Framework for the Design of Reconfigurable Systems based on NoCs

ECE 645: Lecture 1. Basic Adders and Counters. Implementation of Adders in FPGAs

Basic FPGA Architectures. Actel FPGAs. PLD Technologies: Antifuse. 3 Digital Systems Implementation Programmable Logic Devices

ISSN Vol.05,Issue.09, September-2017, Pages:

Project Proposal. ECE 526 Spring Modified Data Structure of Aho-Corasick. Benfano Soewito, Ed Flanigan and John Pangrazio

CSCB58 - Lab 3. Prelab /3 Part I (in-lab) /2 Part II (in-lab) /2 TOTAL /8

An 80Gbps FPGA Implementation of a Universal Hash Function based Message Authentication Code

International Journal of Advanced Research in Computer Science and Software Engineering

Analytical Modeling of Parallel Systems. To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003.

Don t expect to be able to write and debug your code during the lab session.

High-Performance FIR Filter Architecture for Fixed and Reconfigurable Applications

Bloom filters and their applications

Chapter 4. Operations on Data

FPGA: What? Why? Marco D. Santambrogio

Notes on Bloom filters

FPGA for Complex System Implementation. National Chiao Tung University Chun-Jen Tsai 04/14/2011

A Privacy Preserving Model for Ownership Indexing in Distributed Storage Systems

FFT/IFFTProcessor IP Core Datasheet

RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC. Zoltan Baruch

NETWORK INTRUSION DETECTION SYSTEM: AN IMPROVED ARCHITECTURE TO REDUCE FALSE POSITIVE RATE

An Enhanced Bloom Filter for Longest Prefix Matching

Performance Analysis of CORDIC Architectures Targeted by FPGA Devices

FPGA Implementation of Multiplierless 2D DWT Architecture for Image Compression

Architectures and FPGA Implementations of the. 64-bit MISTY1 Block Cipher

Introduction to Field Programmable Gate Arrays

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /ISCAS.2006.

(Refer Slide Time: 2:20)

FPGA. Logic Block. Plessey FPGA: basic building block here is 2-input NAND gate which is connected to each other to implement desired function.

Low-Area Implementations of SHA-3 Candidates

System Verification of Hardware Optimization Based on Edge Detection

Hardware Modeling using Verilog Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

FPGA Matrix Multiplier

Copyright 2011 Society of Photo-Optical Instrumentation Engineers. This paper was published in Proceedings of SPIE (Proc. SPIE Vol.

Implementation of CORDIC Algorithms in FPGA

4 Operations On Data 4.1. Foundations of Computer Science Cengage Learning

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University

Payload Inspection Using Parallel Bloom Filter in Dual Core Processor

MCM Based FIR Filter Architecture for High Performance

COE 561 Digital System Design & Synthesis Introduction

A Lost Cycles Analysis for Performance Prediction using High-Level Synthesis

Fast Evaluation of the Square Root and Other Nonlinear Functions in FPGA

Overview. Implementing Gigabit Routers with NetFPGA. Basic Architectural Components of an IP Router. Per-packet processing in an IP Router

Implementing Logic in FPGA Memory Arrays: Heterogeneous Memory Architectures

International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering

DESIGNING OF STREAM CIPHER ARCHITECTURE USING THE CELLULAR AUTOMATA

Parallel FIR Filters. Chapter 5

A SCALABLE COMPUTING AND MEMORY ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION ON FIELD-PROGRAMMABLE GATE ARRAYS. Theepan Moorthy and Andy Ye

FPGA Implementation of High Speed AES Algorithm for Improving The System Computing Speed

Outcomes. Spiral 1 / Unit 6. Flip Flops FLIP FLOPS AND REGISTERS. Flip flops and Registers. Outputs only change once per clock period

CHAPTER 1 INTRODUCTION

FPGA-Specific Arithmetic Optimizations of Short-Latency Adders

HIGH-PERFORMANCE RECONFIGURABLE FIR FILTER USING PIPELINE TECHNIQUE

Implementation of Galois Field Arithmetic Unit on FPGA

DESIGN OF PARAMETER EXTRACTOR IN LOW POWER PRECOMPUTATION BASED CONTENT ADDRESSABLE MEMORY

FPGA IMPLEMENTATION FOR REAL TIME SOBEL EDGE DETECTOR BLOCK USING 3-LINE BUFFERS

An easy to read reference is:

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION

WITH integrated circuits, especially system-on-chip

Stratix II vs. Virtex-4 Performance Comparison

A SIMULINK-TO-FPGA MULTI-RATE HIERARCHICAL FIR FILTER DESIGN

DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric

A Dedicated Hardware Solution for the HEVC Interpolation Unit

Spiral 1 / Unit 6. Flip-flops and Registers

1 Computer arithmetic with unsigned integers

Outcomes. Spiral 1 / Unit 6. Flip Flops FLIP FLOPS AND REGISTERS. Flip flops and Registers. Outputs only change once per clock period

Topics. Midterm Finish Chapter 7

Introduction to Field Programmable Gate Arrays

Low Complexity Quasi-Cyclic LDPC Decoder Architecture for IEEE n

DIGITAL ARITHMETIC: OPERATIONS AND CIRCUITS

Dynamic analysis in the Reduceron. Matthew Naylor and Colin Runciman University of York

Basic FPGA Architecture Xilinx, Inc. All Rights Reserved

An Efficient VLSI Execution of Data Transmission Error Detection and Correction Based Bloom Filter

Virtex-II Architecture

Packet Inspection on Programmable Hardware

Hashing. Hashing Procedures

Developing a Data Driven System for Computational Neuroscience

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Reconfigurable PLL for Digital System

University, Patiala, Punjab, India 1 2

Transcription:

54 CHAPTER 4 BLOOM FILTER 4.1 INTRODUCTION Bloom filter was formulated by Bloom (1970) and is used widely today for different purposes including web caching, intrusion detection, content based routing, databases and computer networks (Broder and Mitzenmacher 2005). The theory behind the Bloom filter is described in this section. At first, the Bloom filter is described and then its enhancement to meet the requirement of string detection is explained. A Bloom filter offers an attractive choice for string matching. It is a randomized technique to test membership of a string in a group of given strings. Using this technique, a group of strings is compressed at first by calculating multiple hash functions over each string. Then, compressed set of strings is stored using memory. This set can be queried to find out if a given string belongs to it. The two important properties of a Bloom filter that make it a viable solution for string matching are the following: Scalability: Bloom filter uses a constant amount of memory to compress each string irrespective of the length of the original string. Thus, large strings can be stored with smaller memory space. This makes it highly scalable in terms of memory usage. Speed: The amount of computation involved in detecting a string using Bloom filter is constant. This computation is a calculation of hash

55 functions and the corresponding memory lookups. Efficient hash functions can be implemented in hardware easily with little resource consumption. Hence, a hardware implementation of Bloom filter can do string matching at high speeds. Bloom filters use less memory space to store the compressed strings. The amount of memory depends on the number of strings being compressed and typically is few megabits. For instance, to store 10,000 strings, around 200k bits are required. Almost all modern FPGAs come with multi port embedded memory blocks which can be utilized for constructing Bloom filters. However, the real reason for using FPGAs stems from the requirement of memory reconfiguration for Bloom filters. It is obvious that a Bloom filter is maintained for detecting strings of a particular length. If the database of strings to be detected has non uniform number of strings for each unique string length, then the Bloom filters need to be tuned to accommodate this non uniformity and achieve the optimal performance. Moreover, since the string length distribution can change over time, as the Bloom filters need to be retuned to maintain optimality which involves reallocation of the Block memories and hash functions. While doing this, the underlying hardware needs to change. Hence, the FPGAs prove to be extremely effective in such a scenario. 4.2 BLOOM FILTER THEORY The theory behind Bloom filter is described in this section. Given a string x, the Bloom filter computes k hash functions on it producing hash values ranging from 1 to m. It then sets k bits in a m bit long vector at the addresses corresponding to the k hash values. The same procedure is repeated for all the members of the set. This process is called programming of the filter. The query process is similar to programming, where a string whose membership to be verified is given as input to the filter. The Bloom filter generates k hash values using the same hash functions which are used to

56 program the filter. The bits in the m bit long vector at the locations corresponding to the k hash values are looked up. If at least one of these k bits is found not set then the string is declared to be a non-member of the set. If all the bits are found to be set then the string is said to belong to the set with a certain probability. This uncertainty in the membership comes from the fact that those k bits in the m bit vector can be set by any of the n members. Thus, finding a bit set does not necessarily imply that it was set by the particular string being queried. Subsequent sections explain the programming and querying process in detail. 4.2.1 Programming a Bloom Filter A Bloom filter is essentially a bit vector of length m which is used to efficiently represent a set of bit-strings. Given a set of strings S, with n members, a Bloom filter is programmed as follows. For each bit string X, in S, k hash functions, h 1 ()...h k (), are computed on x producing k values each ranging from 1 to m. Each of these values addresses a single bit in the m bit vector; hence each bit-string x causes k bits in the m-bit vector to be set to 1. It is to be noted that if one of the k hash values addresses a bit that is already set to 1, then that bit is not changed. Figure 4.1 and 4.2 illustrate Bloom filter programming. Two bit-strings, x and y are programmed in the Bloom filter with k = 3 hash functions and m = 16 bits in the array. It is to be noted that different strings can have overlapping bit patterns. The following pseudo-code describes adding a bit-string, x, to a Bloom filter. Pseudo-code for programming the Bloom filter is given in Table 4.1. Table 4.1 Pseudo-code for programming the Bloom filter BF Prog (x) i. for (i=1 to k) ii. Vector[hi(x)] 1

57 Figure 4.1 Programming a string x in the Bloom filter where k=3 and m=16 Figure 4.2 Programming a string y in the Bloom filter where k=3 and m=16

58 4.2.2 Querying a Bloom Filter Querying the Bloom filter for set membership of a given bit-string, x, is similar to the programming process. Given bit-string x, k hash values are generated using the same hash functions used to program the filter. The bits in the m-bit vector at the locations corresponding to the k hash values are checked. If at least one of the k bits is 0, then the bit-string is declared to be a non-member of the set, as discussed in Figure 4.3. If all the bits are found to be 1, then the bit-string is said to belong to the set with a certain probability, as shown in Figure 4.4. If all the k bits are found to be set and x is not a member of S, then it is said to be a false positive. The following pseudo-code describes the query process. Pseudo-code for querying the Bloom filter is given in Table 4.2. Table 4.2 Pseudo-code for querying the Bloom filter BF Query (x) i. for (i=1 to k) ii. if (Vector[hi(x)]=0) return false iii. return true Figure 4.3 Querying a string z in the Bloom filter

59 Figure 4.4 Querying a string w in the Bloom filter Figure 4.5 False positive probability The ambiguity in membership comes from the fact that the k bits in the m-bit vector can be set by any of the n members of S. For instance, as given in Figure 4.5, q maps to all the bits which were set by x and y. Although q S, the filter shows a match. Thus, finding a bit set does not necessarily

60 imply that it was set by the particular bit-string being queried. However, finding a 0 bit certainly implies that the bit-string does not belong to the set; if it was a member, then all k-bits would have been set when the Bloom filter was programmed. 4.2.3 False Positive Probability This section derives the mathematical representation of the false positive probability i.e., the probability of finding all the k lookup bits set for a bit-string that is not programmed. The probability that a random bit of the m-bit vector is set to 1 by a hash function is simply m 1. The probability that it is not set are set to 0 is 1 1 m. The probability that it is not set by any of the n n 1 members of x is 1. Since each of the bit-strings sets k bits in the m nk 1 vector, the probability becomes 1. The probability that this bit is 1 m nk 1 becomes 1 1. For a bit of string to be detected as a possible m member of the set, all k bit locations generated by the hash functions need to be 1. The probability that this happens, f, is given by Equation (4.1). f k nk 1 1 1 m (4.1) For the large values of m the above equation reduces to Equation (4.2). k nk f 1 e m (4.2)

61 This probability is independent of the input bit-string and is termed the false positive probability. The false positive probability can be reduced by choosing appropriate values for m and k for a given size of the member set, n. It is clear that the size of the bit-vector, m, needs to be much larger than the m size of the bit-string set n. For the given ratio, the false positive probability n can be reduced by increasing the number of hash functions, k. In the optimal case, when false positive probability is minimized with respect to k, the following relationship is obtained. m k ln 2 (4.3) n The false positive probability at this optimal point is given by Equation (4.4). f k 1 (4.4) 2 It should be noted that if the false positive probability is to be fixed, then the size of the filter, m, needs to scale linearly with the size of the bit-string set, n. In the optimally configured Bloom filter, the probability of finding a bit set is 0.5. Tables 4.3, 4.4, 4.5 and Figure 4.6 give the relationship between false positive ratios and combinations of m/n and k. 4.3 PRACTICAL HASH FUNCTIONS A hash function is a well defined procedure or mathematical function that converts a large, possibly variable-sized amount of data into a small datum, usually a single integer that may serve as an index to an array. The values returned by a hash function are called either as hash values, hash codes, hash sums, checksums or simply hashes.

62 Table 4.3 False positive rate under various m/n and k combinations m/n k k=1 k=2 k=3 k=4 k=5 k=6 k=7 k=8 2 1.39 0.393 0.4 3 2.08 0.283 0.237 0.253 4 2.77 0.221 0.155 0.147 0.16 5 3.46 0.181 0.109 0.092 0.092 0.101 6 4.16 0.154 0.0804 0.0609 0.0561 0.0578 0.0638 7 4.85 0.133 0.0618 0.0423 0.0359 0.0347 0.0364 8 5.55 0.118 0.0489 0.0306 0.024 0.0217 0.0216 0.0229 9 6.24 0.105 0.0397 0.0228 0.0166 0.0141 0.0133 0.0135 0.0145 10 6.93 0.0952 0.0329 0.0174 0.0118 0.00943 0.00844 0.00819 0.00846 11 7.62 0.0869 0.0276 0.0136 0.0086 0.0065 0.00552 0.00513 0.00509 12 8.32 0.08 0.0236 0.0108 0.0065 0.00459 0.00371 0.00329 0.00314 13 9.01 0.074 0.0203 0.0088 0.0049 0.00332 0.00255 0.00217 0.00199 14 9.7 0.0689 0.0177 0.0072 0.0038 0.00244 0.00179 0.00146 0.00129 15 10.4 0.0645 0.0156 0.006 0.003 0.00183 0.00128 0.001 0.00085 16 11.1 0.0606 0.0138 0.005 0.0024 0.00139 0.000935 0.0007 0.00057 17 11.8 0.0571 0.0123 0.0042 0.0019 0.00107 0.000692 0.0005 0.00039 18 12.5 0.054 0.0111 0.0036 0.0016 0.00084 0.000519 0.00036 0.00028 19 13.2 0.0513 0.01 0.0031 0.0013 0.00066 0.000394 0.00026 0.00019 20 13.9 0.0488 0.0091 0.0027 0.0011 0.00053 0.000303 0.0002 0.00014 21 14.6 0.0465 0.0083 0.0024 0.0009 0.00043 0.000236 0.00015 0.0001 22 15.2 0.0444 0.0076 0.0021 0.0008 0.00035 0.000185 0.00011 7.46E-05 23 15.9 0.0425 0.0069 0.0018 0.0006 0.00029 0.000147 8.56E-05 5.55E-05 24 16.6 0.0408 0.0064 0.0016 0.0006 0.00024 0.000117 6.63E-05 4.17E-05 25 17.3 0.0392 0.0059 0.0015 0.0005 0.0002 9.44E-05 5.18E-05 3.16E-05 26 18 0.0377 0.0055 0.0013 0.0004 0.00016 7.66E-05 4.08E-05 2.42E-05 27 18.7 0.0364 0.0051 0.0012 0.0004 0.00014 6.26E-05 3.24E-05 1.87E-05 28 19.4 0.0351 0.0048 0.0011 0.0003 0.00012 5.15E-05 2.59E-05 1.46E-05 29 20.1 0.0339 0.0044 0.0009 0.0003 9.96E-05 4.26E-05 2.09E-05 1.14E-05 30 20.8 0.0328 0.0042 0.0009 0.0002 8.53E-05 3.55E-05 1.69E-05 9.01E-06 31 21.5 0.0317 0.0039 0.0008 0.0002 7.33E-05 2.97E-05 1.38E-05 7.16E-06 32 22.2 0.0308 0.0037 0.0007 0.0002 6.33E-05 2.50E-05 1.13E-05 5.73E-06

63 Table 4.4 False positive rate under various m/n and k combinations m/n k k=9 k=10 k=11 k=12 k=13 k=14 k=15 k=16 11 7.62 0.00531 12 8.32 0.00317 0.00334 13 9.01 0.00194 0.00198 0.0021 14 9.7 0.00121 0.0012 0.00124 15 10.4 0.00078 0.00074 0.00075 0.00078 16 11.1 0.00051 0.00047 0.00046 0.00047 0.00049 17 11.8 0.00034 0.0003 0.00029 0.00028 0.00029 18 12.5 0.00023 0.0002 0.00018 0.00018 0.00018 0.00018 19 13.2 0.00016 0.00013 0.00012 0.00011 0.00011 0.00011 0.00011 20 13.9 0.00011 8.89E-05 7.77E-05 7.12E-05 6.79E-05 6.71E-05 6.84E-05 21 14.6 7.59E-05 6.09E-05 5.18E-05 4.63E-05 4.31E-05 4.17E-05 4.16E-05 4.27E-05 22 15.2 5.42E-05 4.23E-05 3.50E-05 3.05E-05 2.78E-05 2.63E-05 2.57E-05 2.59E-05 23 15.9 3.92E-05 2.97E-05 2.40E-05 2.04E-05 1.81E-05 1.68E-05 1.61E-05 1.59E-05 24 16.6 2.86E-05 2.11E-05 1.66E-05 1.38E-05 1.20E-05 1.08E-05 1.02E-05 9.87E-06 25 17.3 2.11E-05 1.52E-05 1.16E-05 9.42E-06 8.01E-06 7.10E-06 6.54E-06 6.22E-06 26 18 1.57E-05 1.10E-05 8.23E-06 6.52E-06 5.42E-06 4.70E-06 4.24E-06 3.96E-06 27 18.7 1.18E-05 8.07E-06 5.89E-06 4.56E-06 3.70E-06 3.15E-06 2.79E-06 2.55E-06 28 19.4 8.96E-06 5.97E-06 4.25E-06 3.22E-06 2.56E-06 2.13E-06 1.85E-06 1.66E-06 29 20.1 6.85E-06 4.45E-06 3.10E-06 2.29E-06 1.79E-06 1.46E-06 1.24E-06 1.09E-06 30 20.8 5.28E-06 3.35E-06 2.28E-06 1.65E-06 1.26E-06 1.01E-06 8.39E-06 7.26E-06 31 21.5 4.10E-06 2.54E-06 1.69E-06 1.20E-06 8.93E-07 7.00E-07 5.73E-07 4.87E-07 32 22.2 3.20E-06 1.94E-06 1.26E-06 8.74E-07 6.40E-07 4.92E-07 3.95E-07 3.30E-07 Table 4.5 False positive rate under various m/n and k combinations m/n k k=17 k=18 k=19 k=20 k=21 k=22 k=23 k=24 22 15.2 2.67E-05 23 15.9 1.61E-05 24 16.6 9.84E-06 1.00E-05 25 17.3 6.08E-06 6.11E-06 6.27E-06 26 18 3.81E-06 3.76E-06 3.80E-06 3.92E-06 27 18.7 2.41E-06 2.34E-06 2.33E-06 2.37E-06 28 19.4 1.54E-06 1.47E-06 1.44E-06 1.44E-06 1.48E-06 29 20.1 9.96E-07 9.35E-07 9.01E-07 8.89E-07 8.96E-07 9.21E-07 30 20.8 6.50E-07 6.00E-07 5.69E-07 5.54E-07 5.50E-07 5.58E-07 31 21.5 4.29E-07 3.89E-07 3.63E-07 3.48E-07 3.41E-07 3.41E-07 3.48E-07 32 22.2 2.85E-07 2.55E-07 2.34E-07 2.21E-07 2.13E-07 2.10E-07 2.12E-07 2.17E-07

64 Figure 4.6 False positive probability (f) Vs Number of hash functions (k) Hash functions are mostly used to speed up table lookup or data comparison tasks such as signature detection which find broad range of applications in network domain. Although the idea was conceived in the 1950s, the design of good hash functions is still a topic of active research (Knuth and Donald 1973). In this section, the effects of utilizing different hash functions in Bloom filters are analyzed. Performances of different hash functions in hardware are investigated by Ramakrishna et al (1997). Three different types of hash functions in Bloom filters were utilized to implement in FPGA.

65 4.3.1 H3 Class of Universal Hash Function Universal class of hash functions are first introduced by Carter et al (2004). They defined a special class of hash functions known as class H3. The definition of H3 Class is given as follows. Given any string X, consisting of b bits, X b = <x1, x2, x3,..., xb> ith hash function over the string X is defined as hi(x) = di1 and x1 xor di2 and x2 xor di3 and x3 xor.. dib and xb (4.5) where dij s are random coefficients uniformly distributed between 1 to size of the lookup vector, m, and xk is the kth bit of the input string. and is a bit by bit AND operation, and xor is a logical Exclusive OR (XOR) operation. A block diagram of the H3 class of hash functions implemented is given in Figure 4.7. Figure 4.7 A block diagram of a H3 class of universal hash function Input is shifted to one bit left till 16 bits are handled. Each bit is logically AND-ed with the random number. At the end, all AND results are XOR-ed together to get a hash value. This type of hash functions is linear transformations and as a result they distribute the index values randomly.

66 Implementation of these type of hash functions require sixteen 2-input AND gates and a single 16-input XOR gate for a 16 bit signature. They produce key values as the same size of the input. Pseudocode to implement H3 class of hash functions is given in Table 4.6. Table 4.6 Pseudo-code for H3 class of universal hash function for each signature: i. generate as many random numbers as the bits in the signature ii. left shift the signature to get to the specified bit iii. AND each shifted signature with the random number iv. XOR all the results of AND s 4.3.2 Bit Extraction Hash Function This type of hash functions consists of selecting j bits out of b bits of the signature. Depending on the selection fashion of these bits out of input signature, they are classified as regular and randomized bit extraction hash functions. Since regular bit extraction hash functions are constrained in number by the input length, randomized bit extraction hash functions are used. Definition of a randomized bit extraction hash function is as follows. Given any string X, consisting of b bits, X b = <x1, x2, x3,..., xb> ith hash function over the string X is defined as hi(x) = <xl 1, xl 2, xl 3,..., xl j > (4.6) where l j s are random bit positions uniformly distributed between one to size of the input signature, b bits and xl j is the input bit located at l j. A block diagram of randomized bit extraction hash functions implemented is

67 illustrated in Figure 4.8. Implementation of these types of hash functions requires eight 2-input AND gates and a single 8-input XOR gate for a 16 bit signature. A shifter is necessary to left shift the bits in input as specified by random number, l j. Figure 4.8 A block diagram of bit extraction hash function These types of hash functions produce key values shorter in bits than the size of the signature. They distribute keys randomly to the bit positions to extract the bits based on random numbers. Pseudocode to simulate this hash function is given in Table 4.7. Table 4.7 Pseudo-code for bit extraction hash function for each signature: i. generate as many random numbers as the bits in the indices ii. right shift the signature to get to random bit position iii. adjust the bit at random position to the correct position at index by left or right shifting iv. XOR all the results of shifting

68 4.3.3 Hash Functions from XOR Method These types of hash functions partition the b bit long input signature into j bits of segments. The segments are XOR-ed to get the hash value. The segments can be formed either in a regular manner or randomly like bit extraction hash functions. To have random indices, random segment forming hash functions are used. The definition of the hash functions from XOR method is as follows. Given any string X, consisting of b bits, X = <x1, x2, x3,..., xb> ith hash function over the string X is defined as hi(x) = (xs1 xor xs2 )(xs3 xor xs4 )..., (xsj-1 xor xsj ) (4.7) where sj s are the uniformly distributed random bit positions in the input string. xsj are the bits at the position specified by sj. There are two segments of length j-bits are formed and XOR-ed. Figure 4.9 illustrates a block diagram of a hash function from XOR method. Implementation of these types of hash functions requires a shifter to get to the bit at the random position, plus eight 2- input XOR gates and an 8-input XOR gate. The length of the resulting hash value is smaller in bits than the input. Figure 4.9 A block diagram of hash function using XOR method

69 However they map the inputs to the hash values in a completely random manner due to the random selection of bits from input. Pseudo code to implement these types of hash functions is given in Table 4.8. Table 4.8 Pseudo-code for XOR method hash function for each signature: i. generate twice as many random numbers as the bits in the indices ii. right shift the signature to get the random bit positions for two segments iii. XOR the bits at each segment iv. right shift the XOR result to get correct position 4.3.4 FPGA Implementation of hash functions To meet today s high-speed networks with line speeds of 10 GBPS and beyond, FPGA implementation is a feasible solution. Performances of three different hash functions in hardware were investigated. Table 4.9 FPGA implementation of hash functions Hash Function LUTs Flip Flops Block RAMs Universal 2990 (4.4%) 2295 6 Bit Extraction 4550 (9%) 3998 7 XOR Method 3050 (4.5%) 2567 6 It utilizes three different types of hash functions in Bloom filters to examine the effects of them on the performance of low power architecture. Logical designs of low power look up Bloom filter with respect to the types of

70 hash functions were implemented on Xilinx XCV2000E FPGA and utilization of LUTs, Flip Flops and Block Random Access Memory (RAMs) are summarized in the Table 4.9. Device utilization is higher in the type of bit extraction hash function. Implementation results of low power Bloom filter using three different hash functions are presented in the Table 4.9. Based on the results, Universal H3 hash function is selected for further power analysis of low power lookup Bloom filter. 4.4 TYPICAL BLOOM FILTER ARCHITECTURE A block diagram of a typical Bloom filter is illustrated in Figure 4.10. Given a string X, which is a member of the signature set, a Bloom filter computes k hash values on the input X and d which are uniformly distributed between 1 to number of hash functions, k. Then it uses these hash values as index to the m-bit long lookup vector. It sets the bits corresponding to the index given by the hash values computed. It repeats this procedure for each member of the signature set. For an input string Y, the Bloom filter computes k hash values by utilizing the same hash functions used in programming of the bloom filter. Figure 4.10 Typical Bloom Filter

71 The Bloom filter looks up the bit values located on the offsets (computed hash values) on the bit vector. If it finds any bit unset at those addresses, it declares the input string to be a non member of the signature set, which is called a mismatch. Otherwise, it finds that all the bits are set and concludes that input string may be a member of the signature set with a false positive probability, which is called a match. 4.5 DRAWBACK OF DSLT BLOOM FILTER A Bloom filter never produces false negatives. If it finds that an input certainly does not belong to the signature set, then it decides that the input is a non member. However, it may produce false positives when a non member input results as a member of the set. Following the analysis of Dharampurikar et al (2004), the false positive probability f is calculated by (4.2). In order to minimize the false positive probability, the value of m must be quite larger than n. For a fixed value of m/n, k must be large enough such that f gets minimized. Since the number of hash functions in Bloom filters is large to reduce the false positive probability, it is intuitive that their total power consumptions are large. During the programming phase of the Bloom filter, not much can be done to reduce the power consumption; otherwise Bloom filter will produce many false positives. However, while performing lookups over the Bloom filter, the number of hash functions used to produce a decision can be reduced significantly. This is because a Bloom filter never makes false negatives, and it is enough to find a zero on the m-bit long lookup vector to conclude that there is a mismatch. Ilhan Kaya and Taskin Kocak (2006) call this type of lookup operation as low power lookup technique. The architecture to support such a lookup operation for a DSLT is illustrated in Figure 4.11 where the number of hash functions per stage (r) is k/2. The drawback of the DSLT scheme presented by Ilhan Kaya and Taskin Kocak (2006) is the ignorance of further investigation with more

72 divisible stages in look up scheme. This research work continues the investigation of the look up technique with further stages where the number of hash functions per stage (r) are 1, k/2, k/4 and k/8. Figure 4.11 DSLT Bloom filter architecture where hash per stage r = k/2 4.6 MULTI STAGE LOOK UP TECHNIQUE BASED BLOOM FILTER ARCHITECTURE Low power Bloom filter architecture is introduced where r = k/4 is illustrated in Figure 4.12. If a match is attained in the first stage itself then 3/4 of the hash calculations are minimized when half of the hash calculations are reduced. In the similar fashion, low power architecture with k/8 is considered for power analysis. Figure 4.13 illustrates the architecture where the number of hash functions per stage r = 1. H3 Class of Universal Hash function was used in the hash calculations of MSLT.

73 Figure 4.12 MSLT Bloom filter architecture where hash per stage r = k/4 Figure 4.13 MSLT Bloom filter architecture where hash per stage r = 1 4.7 POWER ANALYSIS OF MSLT ARCHITECTURES With reference to the discussion in the section 4.4.4, Universal Hash function is selected for the implementation of MSLTs. Basic functional

74 module of Bloom filter using Universal H3 hash function was implemented in 60nm technology (Figure 4.14) with the following parameters as shown in Table 4.10 to derive the power consumption. Average power calculated will be used in the power analysis of low power Bloom filter architecture in this section. Table 4.10 Design specifications Technology CMOS 60 nm Power Supply 5V Metal Layers 6 Avg. Power 0.801 W @ 5nS Figure 4.14 Physical layout of basic functional module of Bloom filter using H3 hash function A theoretical approach is followed to analyze and compare the power consumptions of the different lookup operations available through Bloom filter architectures presented in the section 4.5 and 4.6. A single Bloom filter shown in Figure 4.10 uses k hash functions in order to make a decision on the input given. Hence, the power consumption of a Bloom filter when performing a regular lookup operation is a summation of the power

75 consumptions of each of the hash value computations, P Hi, plus the power consumed accessing the memory for each hash value computed, P Q, plus the power consumed by an AND gate. k P (P P ) P (4.8) BFreg Hi Q AND i1 Power consumption of an AND gate is ignored hereafter, since it is minimal when compared to the power used by the hash functions. Power required to query m bit vector is approximately constant for each index calculated by any of the hash functions. The power equation for a single Bloom filter simply becomes the total power used by the hash functions and the power consumed by querying the m bit vector for each hash value calculation. k P (P P) (4.9) BFreg Hi Q i1 The power consumption of a regular lookup low power architecture presented in Figure 4.13 is compared with 16-bit implementation of hash functions. In section 4.4.4, the results of hardware implementations of all practical hash functions are presented which recommend universal class of hash functions called H3 is suitable for hardware applications. Hence, all of the k hash functions are of type 8-bit H3 class of hash functions. Then Equation (4.9) becomes PBFreg k.(ph8 P) Q (4.10) To derive the power consumption of the new architecture proposed, a mathematical analysis similar to the analysis done in Mitzenmacher (2002) is followed. At first the probability of match in the first stage is derived. The

76 probability that a bit is still unset after all the signatures are programmed into the Bloom filter by using k independent hash functions is. kn 1 1 e m kn m (4.11) where 1 / m represents any one of the m bits set by a single hash function operating on a single signature. Then (1 1/m) is the probability that the bit is unset after a single hash value computation with a single signature. To remain unset, it should not be set by any of the k hash functions each operating on all signatures in the signature set. Consequently, the probability that any one of the bits set is kn m 1 1 e (4.12) In order to produce a match in the first stage, the bits indexed by all r of the independent random hash functions should be set. So the match probability of the first stage is, represented as p, p r 1 1 1e i1 kn r m (4.13) r The mismatch probability of the first stage is 1-p, 11e kn m r (4.14) With a probability of (1-p) the first stage of the hash functions in the Bloom filter will produce a mismatch when performing a lookup operation. Otherwise, the first stage produces a match, and then the second stage is used

77 to compare the input with the signature sought as it is suggested by the architecture proposed. Therefore the power consumption of a Bloom filter shown in Figure 4.11 where r = k/2 is given by BFr k/2 IstStage 2ndstage P P P Match P (4.15) k/2 k (4.16) P P P p P P BFrk/2 Hi Q Hi Q i1 k j 1 2 k PBFrk/2 PH8 PQ 1p 2 k 2 (4.17) Power consumption of a Bloom filter where r = k/4 and r = k/8 are given by Equations (4.18) and (4.19) respectively. k PBFrk/4 PH8 PQ 1p p p 4 k k 3k 4 2 4 (4.18) k PBFrk/8 PH8 PQ 1p p p 1p p p p 8 k k 3k k 3k 3k 7k 8 4 8 2 8 4 8 (4.19) Given by the equation 4.20 (Ilhan Kaya and Taskin Kocak 2006), The Power Saving Ratio (PSR) of Bloom filter implemented based on the architectures presented functioning on two different lookup techniques can be calculated as PBFreg P k BFr n PSR (4.20) P BFreg

78 Using Equation (4.20), with reference to the power consumption of BF reg, PSR of BF r=k/2, BF r=k/4 and BF r=k/8 are calculated for various k values by considering following specifications given in Table 4.11. Table 4.11 Design specifications m/n ratio 21 Number of signatures, n 1024 Size of the m bit vector, m 21504 Width of the signature, i 8 P H8 +P Q 0.801 W where, P H8 +P Q, is average power consumption of basic functional module. As illustrated in section 5, P H8 +P Q comprise both power consumptions of both hash value calculation and match query for single hash function. As illustrated in section 4.7, basic functional module was implemented in Complementary Metal Oxide Semiconductor (CMOS) 60 nm technology using a back end tool and average power consumption was calculated which has been used in the power analysis of proposed low power architectures. When the number of hash functions per stage (r) decreases, power consumption reduces. PSR of BF r=k/2, BF r=k/4 and BF r=k/8 are calculated with reference to BF reg and plotted in Figure 4.15. For different values of the number of hash functions (k) over power consumption of Bloom filter architectures BF r=k/2, BF r=k/4 and BF r=k/8 are illustrated in Figure 4.16. When the number of hash functions per stage (r) decreases, PSR increases. When k increases more than 128, PSR of all three architectures converge.

79 Figure 4.15 PSR Vs Number of hash functions (k) Figure 4.16 Power Vs Number of hash functions (k)

80 Observation shows that increment in k increases the number of basic functional modules used in the design which increases the device density. Obviously device density is directly proportional to power consumption, by the observation from Figure 4.16, which cannot be compensated using parallel look up techniques proposed. This work suggests that selecting less number of hash functions to design Bloom filter architecture with the cost of m/n ratio results in better PSR. 4.8 FPGA IMPLEMENTATION OF MSLT ARCHITECTURES Results of hardware implementation in Xilinx 10.1i are implemented. The simulation for each pattern set was synthesized, placed, and routed on the Virtex5 XC5VLX85 (Xilinx, 2009) chip where the package and speed are FF676 and -3, respectively. To evaluate the proposed implementations, simulations are performed based on the following issues: Table 4.12 FPGA Implementation of MSLT Architectures Design MSLT DSLT Device Virtex5- LX85T Size of the Signature No. of Signatures Slice No. of Registers No. of LUT 32 16028 7635 6550 30387 16 16028 5626 9690 15375 32 16028 17239 8775 42632 16 16028 8852 13758 28574 Size of the signature (bits): Each signature is 16 or 32-bit width data. If bits per cycle are more then throughput is better. Slice: Slice is the FPGA resource in Xilinx FPGA chip. The number of logic elements in a slice is dependent on the FPGA device.

81 Number of slices represents the area cost. In Virtex-5, each FPGA slice contains four LUTs and four flip-flops. Clock period: The clock period is the speed of the maximum critical path in FPGA. The period can be obtained from the synthesis report of Xilinx software. The smaller is clock period, the faster is its implementation. Table 4.12 shows the experiment results. The number of signatures in these pattern sets is 16028. 16 bit and 32 bit designs are simulated for each pattern set. The number of registers and number of LUTs show the device utilization of proposed architectures. Proposed MSLT architectures consume 29% less devices than DSLT in this implementation. 4.9 SUMMARY In this chapter, low power Bloom filter architectures are proposed to meet the network application in the hardware platform. According to this, a better Hash function is selected for hardware implementation. Further, average power consumption of basic functional module of Bloom filter using H 3 universal hash function is derived using CMOS 60 nm technology. Mathematical analysis is carried out to calculate the Power consumption and PSR of the low power Bloom filter architectures with different values of number of hash function per stage (r). Power analysis has shown that increment in the number of hash function per stage reduces the power consumption of the proposed architecture. FPGA implementation results and comparison with similar Bloom filter based signature detection techniques used in NIDS show the hardware compatibility of the proposed architecture. The design parameters, number of hash functions (k), width of the filter (m), number of stages (r) and false Positive probability (f) can be determined for the proposed architecture by the results shown in Figure 4.15

82 & 4.16. If k is smaller, then it decreases the power consumption with less number of hash functions, but the probability of false positive will increase. If m is larger, it will reduce the false positive rate, but searching time and power in the filtering stage will be more. Hence, the design parameters are carefully selected by understanding the trade off among the design parameters. Proposed MSLT architecture involves with parallel k stage hash functions. Pipelined multi stage architecture was also considered and discussed at the earlier stage of the research. Even though pipelined architecture reduces the computation time, it will introduce more hardware complexity which will directly affect the system s performance.