CSE 535 : Lecture 5 String Matching with Bloom Filters Washington University Fall 23 http://www.arl.wustl.edu/arl/projects/fpx/cse535/ Copyright 23, Sarang Dharmapurikar [Guest Lecture] CSE 535 : Fall 23 Background on Bloom Filter Data structure proposed by Burton Bloom Randomized data structure Strings are stored using multiple hash functions It can be queried to check the presence of a string Membership queries result in rare false positives but never false negatives Originally used for UNI spell check Modern applications include : Content Networks Summary Caches route trace-back Network measurements Intrusion Detection CSE 535 : Fall 23 2
Hash Functions Input : x Output : H[x] Properties Each value of x maps to a value of H[x] Typically: Size of (x) >> Size of (H[x]) H[x] evenly distributed over values of x Implementation Hash Function OR of bits, Shifting, rotates.. H H[] CSE 535 : Fall 23 3 Programming a Bloom Filter Bloom filter computes k hash functions on input H m-bit vector CSE 535 : Fall 23 4
Programming a Bloom Filter Y H m-bit vector CSE 535 : Fall 23 5 Querying a Bloom Filter H match m-bit vector CSE 535 : Fall 23 6
Querying a Bloom Filter W H Match (false positive) m-bit vector CSE 535 : Fall 23 7 Optimal Parameters of a Bloom filter n : number of strings to be stored k : number of hash functions m : the size of the bit-array (memory) The false positive probability f = (½) k Y H H 4 The optimal value of hash functions, k, is k = ln2 m/n =.693 m/n m-bit Array Key Point : False positive probability decreases exponentially with linear increase in the number of hash functions & memory CSE 535 : Fall 23 8
Counting Bloom Filters A message once programmed in the Bloom filter can not be deleted Deletion of message requires clearing the corresponding bits Since a bit can be set by multiple messages, clearing it will disturb other messages Counting Bloom filters solve the problem Array of counters instead of array of bits Increment the corresponding counters when a message is added, decrement when deleted A B 2 2 off-chip counter array CSE 535 : Fall 23 9 A B Counting Bloom Filters Maintain Bloom filters on the chip and corresponding counters off the chip Saves on the on-chip resources to implement counters Addition and deletion of messages are rare Set the bit when corresponding counter changes to, clear it when counter changes to 2 2 On-chip bit array off-chip counter array CSE 535 : Fall 23
Using Bloom filters for String Matching Hash Table False Positives Resolver BF W BF 5 BF 4 BF 3 Entering byte b W --------- b 5 b 4 b 3 b 2 b Leaving byte CSE 535 : Fall 23 Bloom filter for cs535 Hash Table BF 6 Entering byte b W --------- b 5 b 4 b 3 b 2 b Leaving byte CSE 535 : Fall 23 2
System Overview Receives the control packets, decodes the commands in it and accordingly either updates the Bloom filter or updates the hash table SDRAM Off-chip 64 Mega bytes A component that reads-writes data given by the user component in the off-chip SDRAM Sends Control packets to CPP and data packets to Bloom filter SDRAM Controller Hash Table Interface Implements the hash-table around SDRAM communicates with the SDRAM controller through a request grant protocol Process the packet headers Control Packet Processor Bloom Filter When hash table instructs, it sends a notification packet out Input Controller Output Controller Protocol Wrappers CSE 535 : Fall 23 3 Bloom Filters on the FP Platform ilinx CV2E FPGA Implements Reconfigurable Application Device (RAD) on the Fieldprogrammable Port Extender (FP) Contains 6 Embedded RAMs Each BlockRAM has dual (2) ports Each BlockRAM stores 496 bits Enables MP2 to implement large, fast, parallel Bloom filters Bloom filters implemented on the Reconfigurable Application Device Field-programmable Port Extender (FP) Platform CSE 535 : Fall 23 4
Partial Bloom Filter bit dina Hash Value Calculator H () wea addra dinb web addrb 496 bits douta doutb Output (match/no match) () CSE 535 : Fall 23 5 Partial Bloom Filter Address Valid PBF BRAM # Bit bit Hash Value Calculator H () Request Decoder dina wea addra dinb web addrb 496 bits douta doutb Output (match/no match) () CSE 535 : Fall 23 6
Bloom Filter Control Interface H H2 PBF H3 H4 PBF 2 Hash Value Calculator H5 H6 H7 PBF 3 Match H8 PBF 4 H9 H PBF 5 CSE 535 : Fall 23 7