Bloom Filters. References:

Similar documents
Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

Summary Cache based Co-operative Proxies

Compact data structures: Bloom filters

An Enhanced Bloom Filter for Longest Prefix Matching

Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

One Memory Access Bloom Filters and Their Generalization

Bloom filters and their applications

Background on Bloom Filter

Web-based Energy-efficient Cache Invalidation in Wireless Mobile Environment

ID Bloom Filter: Achieving Faster Multi-Set Membership Query in Network Applications

arxiv: v1 [cs.ds] 11 Apr 2008

Efficient Resource Management for the P2P Web Caching

Error Detection and Correction by using Bloom Filters R. Prem Kumar, Smt. V. Annapurna

Multi-pattern Signature Matching for Hardware Network Intrusion Detection Systems

A Credential-Based Data Path Architecture for Assurable Global Networking

Bloom Filter for Network Security Alex X. Liu & Haipeng Dai

A New Memory Efficient Technique for Fraud Detection in Web Advertising Networks

PartialSync: Efficient Synchronization of a Partial Namespace in NDN

A Robust Bloom Filter

False Rate Analysis of Bloom Filter Replicas in Distributed Systems

CLIP: A Compact, Load-balancing Index Placement Function

Project Proposal. ECE 526 Spring Modified Data Structure of Aho-Corasick. Benfano Soewito, Ed Flanigan and John Pangrazio

Length Indexed Bloom Filter Based Forwarding In Content Centeric Networking

DoS Attacks. Network Traceback. The Ultimate Goal. The Ultimate Goal. Overview of Traceback Ideas. Easy to launch. Hard to trace.

Single Packet IP Traceback in AS-level Partial Deployment Scenario

Subway : Peer-To-Peer Clustering of Clients for Web Proxy

request is not a cache hit when the summary indicates so (a false hit), the penalty is a wasted query message. If the request is a cache hit when the

Seminar on. By Sai Rahul Reddy P. 2/2/2005 Web Caching 1

SKALA: Scalable Cooperative Caching Algorithm Based on Bloom Filters

Routing Lookup Algorithm for IPv6 using Hash Tables

Reducing False Positives of a Bloom Filter using Cross-Checking Bloom Filters

Internet Caching Architecture

A Framework for Efficient Class-based Sampling

Lecture 10: Addressing

Securing Networks with P4

Membership test for Mapping Information optimization draft-flinck-lisp-membertest-00

Evaluation of Path Recording Techniques in Secure MANET

Mapping Internet Sensors with Probe Response Attacks

An Efficient and Practical Defense Method Against DDoS Attack at the Source-End

A Precise and Practical IP Traceback Technique Based on Packet Marking and Logging *

An Enhanced Dynamic Packet Buffer Management

Hash-Based String Matching Algorithm For Network Intrusion Prevention systems (NIPS)

A Lightweight IP Traceback Mechanism on IPv6

SIGNET: NETWORK-ON-CHIP FILTERING FOR COARSE VECTOR DIRECTORIES. Natalie Enright Jerger University of Toronto

CHAPTER 4 BLOOM FILTER

Scalable Hash-based IP Traceback using Rate-limited Probabilistic Packet Marking

x 1 x 2... y 1 y 2...

Modelling and Analysis of Push Caching

An Efficient Provable Data Possession Scheme based on Counting Bloom Filter for Dynamic Data in the Cloud Storage

Project Proposal. ECE 526 Spring Modified Data Structure of Aho-Corasick. Benfano Soewito, Ed Flanigan and John Pangrazio

Bloom Filters via d-left Hashing and Dynamic Bit Reassignment Extended Abstract

A New Logging-based IP Traceback Approach using Data Mining Techniques

Network Algorithmics, Introduction. George Varghese. April 2, 2007

Payload Inspection Using Parallel Bloom Filter in Dual Core Processor

Good Memories: Enhancing Memory Performance for Precise Flow Tracking

Relative Reduced Hops

Interdomain Routing Design for MobilityFirst

Web File Transmission by Object Packaging Performance Comparison with HTTP 1.0 and HTTP 1.1 Persistent Connection

Prof. N. P. Karlekar Project Guide Dept. computer Sinhgad Institute of Technology

Low-Overhead Message Tracking for Distributed Messaging

Web File Transmission by Object Packaging Performance Comparison with HTTP 1.0 and HTTP 1.1 Persistent Connection

Network Control and Signalling

CE Advanced Network Security Network Forensics

LONGEST prefix matching (LPM) techniques have received

A NEW IP TRACEBACK SCHEME TO AVOID LAUNCH ATTACKS

Mapping Internet Sensors with Probe Response Attacks

Approximate Packet Classification Caching

Reducing Outgoing Traffic of Proxy Cache by Using Client-Cluster

Robust TCP Stream Reassembly In the Presence of Adversaries

Broadcast Updates with Local Look-up Search (BULLS): A New Peer-to-Peer Protocol

A Privacy Preserving Model for Ownership Indexing in Distributed Storage Systems

EXAM TCP/IP NETWORKING Duration: 3 hours With Solutions

Scalable Enterprise Networks with Inexpensive Switches

1-1. Switching Networks (Fall 2010) EE 586 Communication and. September Lecture 10

Exscind: A Faster Pattern Matching For Intrusion Detection Using Exclusion and Inclusion Filters

Telecom Systems Chae Y. Lee. Contents. Overview. Issues. Addressing ARP. Adapting Datagram Size Notes

Compressed Bloom Filters

Adaptive Bloom Filters for Multicast Addressing

CAPTRA: Coordinated Packet Traceback

Protecting Network Quality of Service Against Denial of Service Attacks

An Integration Approach of Data Mining with Web Cache Pre-Fetching

Codes, Bloom Filters, and Overlay Networks. Michael Mitzenmacher

Flexible Indexing Using Signatures

Approximate Caches for Packet Classification

Improving the Database Logging Performance of the Snort Network Intrusion Detection Sensor

Design and Implementation of A P2P Cooperative Proxy Cache System

RED behavior with different packet sizes

Evaluation Study of a Distributed Caching Based on Query Similarity in a P2P Network

New Payload Attribution Methods for Network Forensic Investigations

TOPO: A Topology-aware Single Packet Attack Traceback Scheme

Overview. Implementing Gigabit Routers with NetFPGA. Basic Architectural Components of an IP Router. Per-packet processing in an IP Router

An Analysis of the Number of ICP Packets on the Distributed WWW Caching System

Internet Protocol version 6

THE CACHE REPLACEMENT POLICY AND ITS SIMULATION RESULTS

CS 344/444 Computer Network Fundamentals Final Exam Solutions Spring 2007

Content-Oriented Routing and Its Integration

Lockless Hash Tables with Low False Negatives

Estimating Persistent Spread in High-speed Networks Qingjun Xiao, Yan Qiao, Zhen Mo, Shigang Chen

IPT Framework: A Technical & Administrative Approach for IP Packets Traceback and Identifying Cyber Criminals

PERG-Rx: An FPGA-based Pattern-Matching Engine with Limited Regular Expression Support for Large Pattern Database. Johnny Ho

Transcription:

Bloom Filters References: Li Fan, Pei Cao, Jussara Almeida, Andrei Broder, Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol, IEEE/ACM Transactions on Networking, Vol. 8, No. 3, June 2000. B. Bloom, Space/time trade-offs in hash coding with allowable errors, CACM, Vol. 13, No. 7, July 1970. 11/21/05 (SSL) 1

The Problem Bloom filter represents a set A = {a 1, a 2,, a n } of n stored elements (also called keys) A sequence of keys is later tested one by one for membership in the set The great majority of keys to be tested do not belong to the set fast reject time desired and no false negative Tradeoff between storage space for Bloom filter and false positive rate 11/21/05 (SSL) 2

The idea allocate a vector v of m bits, initially all set to 0 choose k independent hash functions, {H 1, H 2,, H k }, each with range {1,, m} for each stored element, a, the bits at positions H 1 (a), H 2 (a),..., H k (a) in v are set to 1. A particular bit might be set to 1 multiple times. Bloom filter with 4 hash functions 11/21/05 (SSL) 3

Membership query To determine whether b is in the set, the bits at positions H 1 (b), H 2 (b),..., H k (b) are checked. If any of them is 0, then certainly b is not in the set A no false negative However a false positive is probable parameters k and m are chosen to trade memory space for a small false positive probability False positive probability decreases as k increases or m/n increases 11/21/05 (SSL) 4

The math After inserting n keys into a filter of size m (bits), the probability that a particular bit in the filter is still 0 is The probability of a false positive in this situation is 11/21/05 (SSL) 5

Optimal tradeoff The right hand size of previous equation is minimized for k = ln 2 (m/n), in which case it becomes False positive probability With optimum integral no. of hash functions k=4 m/n False positive probability decreases as m/n increases 11/21/05 (SSL) 6

Handling membership changes For each location h in the bit array, h=1,, m, maintain a count, c(h), initially zero, equal to the number of times the bit location has been set to 1 When a key a joins/leaves A, the counts c(h 1 (a)), c(h 2 (a)),..., c(h k (a)) are incremented/decremented by 1 A bit location is turned on when its count changes from 0 to 1 A bit location is turned off when its count changes from 1 to 0 11/21/05 (SSL) 7

How much memory for counts After inserting n keys with k hash functions into array of m bits, probability that any count is greater than or equal to i Assume number of hash functions to be less than ln 2 (m/n), which is the optimum 11/21/05 (SSL) 8

How much memory for counts (cont.) For i = 16, we have Allowing 4 bits per count, for a practical m value, the probability of overflow is negligible. If the count ever exceeds 15 and it stays at 15 when the count should be incremented the consequence is that many deletions later, the Bloom filter may allow a false negative 11/21/05 (SSL) 9

Application Summary Cache Cooperating proxies behind an Internet bottleneck proxies serve each other s cache misses ICP protocol a cache miss causes queries sent to all other proxies (not scalable) Summary Cache each proxy computes a summary (Bloom filter) of URLs of its cached documents, together with counts for bit locations sends bit array to every other proxy sends update summary when % new documents reaches a threshold 11/21/05 (SSL) 10

More on Summary Cache A local cache miss results in queries sent only to proxies whose summaries have the requested document large reduction in msg traffic Summaries do not have to be up-to-date or accurate False misses total hit rate reduced, due to delayed updates False hits some bandwidth wasted Remote stale hits some bandwidth wasted Memory required increases with # of proxies 11/21/05 (SSL) 11

Other applications Sarang Dharmapurikar, et al., Longest Prefix Matching using Bloom Filters, Proceedings ACM SIGCOMM 2003, August 2003 Bloom filter i for the set of IP address prefixes of length i, i = 1,, 32 (some filters may be empty) To find next hop for a particular IP address, the address is used to probe all Bloom filters in parallel to get matching prefix lengths Then probe hash table associated with longest matching prefix length (first) 11/21/05 (SSL) 12

Other applications Alex C. Snoeren, et al., Hash-Based IP Traceback, Proceedings ACM SIGCOMM 2001, August 2001 Routers compute 32-bit digest over the invariant portion of IP header and first 8 bytes of payload of every packet forwarded Store digests in Bloom filters to save memory (down to 0.5% of link bandwidth per unit time) Use stored digests in routers to trace the source of attack packets 11/21/05 (SSL) 13

Other applications Intrusion detection, content based routing A. Broder and M. Mitzenmacher, Network applications of Bloom filters: A Survey, Proceedings 40 th Annual Allerton Conference, October 2002. 11/21/05 (SSL) 14