Compact data structures: Bloom filters
|
|
- Derick Lang
- 5 years ago
- Views:
Transcription
1 Compact data structures: Luca Becchetti Sapienza Università di Roma Rome, Italy April 7, 2010
2 1 2 3
3 Dictionaries A dynamic set S of objects from a discrete universe U, on which (at least) the following operations are possible: Item insertion Item deletion Set memberhisp: decide whether item x S Typically, it is assumed that each element in S is uniquely identified by a key. Let obj(k) be object with key k: Operations insert(x, S): insert item x delete(k, S): delete item whose key is k retrieve(k, S): retrieve obj(k) This is a minimal set of operations. Any database implements a (greatly augmented) dictionary
4 Testing for membership Dictionaries are often large or huge in many applications Any of the operations above potentially involves access to secondary storage Set membership Retrieval (deletion) can be restated as follows: if obj(k) S then retrieve(k, S) (delete(k, S)) Set membership ismember(k, S): if false then obj(k) S. Why this: membership can be tested efficiently using compact data structures Check often in main memory No need to access secondary storage if false
5 Example: spell-checker Provide first level of spell checking for a text editor Must quickly report spell mistakes to user Exact check Need efficient data structure Trees are typically used Terms correspond to nodes (typically leaves) of the tree Thesaurus in the order of terms May be too large for quick response times Idea: trade accuracy for efficiency
6 Used to provide a compact summary of a set of keys Key k hashed t times on [m] = {0,..., m 1} using t independent hash functions Binary array B of size m (m typically a prime) For the moment: only insertions and set membership h 1 (k) 1 m-1 h 2 (k) 1 k 1 h t (k) 0 Bloom filter
7 Use of (object retrieval) Main memory ismember(k) true 1 2 Bloom filter retrieve(k) obj(k) 3 4 Database Time Potential savings for retrieval (insertion/deletion) - (3) and (4) do not occur if ismember(k) returns false - Bloom filter stored in main memory
8 : insertion and set membership insert(k) Require: k: object key 1: for j : 1... t do 2: i = h j (k) 3: if B i == 0 then 4: B i = 1 5: end if 6: end for ismember(k) Require: k: object key 1: member = true; j = 1 2: while member == true && j <= t do 3: i = h j (k) 4: if B i == 0 then 5: member = false 6: end if 7: j = j + 1 8: end while 9: return member Figure: Bloom filter: insertion and set membership (S is implicit) Initially, B i = 0 for every i B is a compact summary of keys of elements in S
9 False positives - No false negatives but... - Assume h 1 (k) = 2k + 1 mod 5, h 2 (k) = x + 2 mod 5 - ismember(4) returns true false positive h 1 (k) h 2 (k) t = 2 and m = 5: Insertion of keys (5, 2, 3)
10 1 2 3
11 The mathematics of Having false positives means that we might access database even if it contains no element with searched key Can be acceptable if P[false positive] small Probability of false positives Assume n elements in the Bloom filter Assume every h j ( ) ideal, i.e., it hashes every item uniformly at random and independently of the others (for the sake of the analysis) Consider ismember(k), with obj(k) S What is P[ismember(k) == true]? Small if m large enough
12 Fraction of 0 s Assume ideal h( ) s Assume that, after n insertions, fraction of 0 s in B is p Consider k B: P[ismember(k) == true] = (1 p) t The fraction of 0 s determines the probability of a false positive p is itself a random variable that depends on t and m
13 Fraction of 0 s cont. The B i s are random variables that depend on the input and the hash functions After n insertions we have: ( P[B i = 0] = 1 1 ) tn m E[p] = 1 m 1 P[B i = 0] = m i=0 ( 1 1 ) tn e tn/m m if X = number of 0 s then X = mp and E[X ] = me[p] Theorem ([Mitzenmacher, 2002]) Let X denote the number of 0 s in Bloom filter after n insertions. P[ p E[p] > ɛ] = P[ X me[p] > ɛm] 2e 2ɛ2 m 2 /tn
14 Fraction of 0 s cont. Remarks The B i s are not statistically independent (why?) Proof uses an extension of Chernoff bounds Note that p is very close to E[p] with high probability. Example: if m 17 nt, p [0.9E[p], 1.1E[p]] with probability at least 99% verify In practice (see further) condition above or similar easy to satisfy In the rest of this section we assume that p E[p] e tn/m deterministically This can be made rigorous at the cost of some complication in the analysis
15 Choice of m and t We have seen that with good approximation: P[ismember(k) == true] = (1 p) t (1 e tn/m ) t We can play with parameters m (size of Bloom filter) and t (number of hash functions) In the remainder of the analysis, we fix m and minimize the expression f (t) = (1 e tn/m ) t w.r.t. t (n is given, m is fixed) We next take g(t) = ln f (t) = t ln(1 e tn/m ). Minimizing f (t) is equivalent to minimizing g(t) but the latter is easier
16 Choice of m and t cont. We have: dg dt = ln(1 e tn/m ) + tn e tn/m m 1 e tn/m Derivative is 0 when t = m ln 2 n and this is a global minimum With this choice: P[ismember(k) == true] f (t) = 1 2 t (0.6185) m n Of course, the number t of hash functions has to be an integer
17 Recap n is given For any given m, t = m ln 2 n ideally, m ln 2 n or m ln 2 n in practice highly effective if m = cn, with c a small constant Example: c = 8, t = 5 or 6 false positive probability 0.02 Fixing m: in practice, choose a value a few times higher than the max predictable size of your databse
18 Recap cont. Assume database with n = 10 6 documents, keys are document digests of size 1Kbit each 256 MBytes A retrieve operation can be very expensive, caching can only in part mitigate Using m = 8n, we have a 1MB size Bloom filter that occupies an only small fraction of main memory Still missing... Deletions Can be implemented at the expense of a moderate increase in memory
19 Handling deletions Substitute binary array with counter array (counting Bloom filter) 1 4 h 1 (k) h 2 (k) Counting Bloom filter with t = 2 and m = 5: Insertion of keys (5, 2, 3)
20 Counting : insertion and deletion insert(k) Require: k: object key 1: for j : 1... t do 2: i = h j (k) 3: C i = C i + 1 4: end for delete(k) Require: k: object key 1: if ismember(k) then 2: for j : 1... t do 3: i = h j (k) 4: C i = C i 1 5: end for 6: end if Figure: Counting : insertion and deletion (S is implicit) Possible to prove that 4 bits per counter suffice for most applications [Broder and Mitzenmacher, 2004] ismember(k) unchanged
21 Applications [Broder and Mitzenmacher, 2004] Databases maintenance (since the early 80 s) Cooperative distributed caching (see also [Fan et al., 2000]) P2P/Overlay networks Resource routing Packet routing
22 Summary cache [Fan et al., 2000] Internet Caching Protocol (ICP) Proxies cooperate
23 Summary cache cont. On a cache miss, a proxy contacts its neighbour proxies instead of requesting the page from Web server ICP traffic can cause great overhead even for few proxies Idea Each proxy stores a (counting) Bloom filter of every other proxy s contents Keys are the URLs On a cache miss: 1 Check locally stored for key membership 2 Contact a proxy whose relevant Bloom filter is positive for the key
24 Questions Q1 Consider two dictionaries over the same universe of objects (and therefore keys) Describe how and why allow to easily construct a compact summary of their union Q2 Dictionary in secondary storage with n items, no insertions/deletions retrieve(k) costs (time to access disk) Access to main memory negligible 70% of requested items not in dictionary Let T be response time Design a Bloom filter such that speed-up E[T ] 2, i.e., a 100%
25 Example: spell-checker Text editor spell-checker Must quickly report spell mistakes to user Thesaurus contains 10 5 terms Average term length: 10 bytes Design a Bloom filter that performs spell - checking with probability of error 0.01
26 Example: spell-checker Text editor spell-checker Must quickly report spell mistakes to user Thesaurus contains 10 5 terms Average term length: 10 bytes Design a Bloom filter that performs spell - checking with probability of error 0.01 Solution Impose that (0.6185) m n 0.01 m n 9.59 t = m n ln We can use a Bloom filter of size 1Mbit using 7 hash functions Note that storing all words requires 1Mbyte + data structure
27 Broder, A. and Mitzenmacher, M. (2004). Network applications of bloom filters: A survey. In Internet Mathematics, A K Peters, Ltd., volume 1. Fan, L., Cao, P., Almeida, J., and Broder, A. Z. (2000). Summary cache: a scalable wide-area Web cache sharing protocol. IEEE/ACM Transactions on Networking, 8(3): Mitzenmacher, M. (2002). Compressed bloom filters. IEEE/ACM Transactions on Networking, 10(5):
Bloom Filters. References:
Bloom Filters References: Li Fan, Pei Cao, Jussara Almeida, Andrei Broder, Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol, IEEE/ACM Transactions on Networking, Vol. 8, No. 3, June 2000.
More informationSummary Cache: A Scalable Wide-Area Web Cache Sharing Protocol
Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol Li Fan, Pei Cao and Jussara Almeida University of Wisconsin-Madison Andrei Broder Compaq/DEC System Research Center Why Web Caching One of
More informationBloom filters and their applications
Bloom filters and their applications Fedor Nikitin June 11, 2006 1 Introduction The bloom filters, as a new approach to hashing, were firstly presented by Burton Bloom [Blo70]. He considered the task of
More informationarxiv: v1 [cs.ds] 11 Apr 2008
arxiv:0804.1845v1 [cs.ds] 11 Apr 2008 An Optimal Bloom Filter Replacement Based on Matrix Solving Ely Porat Bar-Ilan University Abstract We suggest a method for holding a dictionary data structure, which
More informationNotes on Bloom filters
Computer Science B63 Winter 2017 Scarborough Campus University of Toronto Notes on Bloom filters Vassos Hadzilacos A Bloom filter is an approximate or probabilistic dictionary. Let S be a dynamic set of
More informationSummary Cache based Co-operative Proxies
Summary Cache based Co-operative Proxies Project No: 1 Group No: 21 Vijay Gabale (07305004) Sagar Bijwe (07305023) 12 th November, 2007 1 Abstract Summary Cache based proxies cooperate behind a bottleneck
More information1 Probability Review. CS 124 Section #8 Hashing, Skip Lists 3/20/17. Expectation (weighted average): the expectation of a random quantity X is:
CS 124 Section #8 Hashing, Skip Lists 3/20/17 1 Probability Review Expectation (weighted average): the expectation of a random quantity X is: x= x P (X = x) For each value x that X can take on, we look
More informationCS 561, Lecture 2 : Hash Tables, Skip Lists, Bloom Filters, Count-Min sketch. Jared Saia University of New Mexico
CS 561, Lecture 2 : Hash Tables, Skip Lists, Bloom Filters, Count-Min sketch Jared Saia University of New Mexico Outline Hash Tables Skip Lists Count-Min Sketch 1 Dictionary ADT A dictionary ADT implements
More informationCS 561, Lecture 2 : Randomization in Data Structures. Jared Saia University of New Mexico
CS 561, Lecture 2 : Randomization in Data Structures Jared Saia University of New Mexico Outline Hash Tables Bloom Filters Skip Lists 1 Dictionary ADT A dictionary ADT implements the following operations
More informationAn Enhanced Bloom Filter for Longest Prefix Matching
An Enhanced Bloom Filter for Longest Prefix Matching Gahyun Park SUNY-Geneseo Email: park@geneseo.edu Minseok Kwon Rochester Institute of Technology Email: jmk@cs.rit.edu Abstract A Bloom filter is a succinct
More informationCuckoo Hashing for Undergraduates
Cuckoo Hashing for Undergraduates Rasmus Pagh IT University of Copenhagen March 27, 2006 Abstract This lecture note presents and analyses two simple hashing algorithms: Hashing with Chaining, and Cuckoo
More informationCS 270 Algorithms. Oliver Kullmann. Generalising arrays. Direct addressing. Hashing in general. Hashing through chaining. Reading from CLRS for week 7
Week 9 General remarks tables 1 2 3 We continue data structures by discussing hash tables. Reading from CLRS for week 7 1 Chapter 11, Sections 11.1, 11.2, 11.3. 4 5 6 Recall: Dictionaries Applications
More informationWorst-case running time for RANDOMIZED-SELECT
Worst-case running time for RANDOMIZED-SELECT is ), even to nd the minimum The algorithm has a linear expected running time, though, and because it is randomized, no particular input elicits the worst-case
More informationCompressed Bloom Filters
Compressed Bloom Filters Michael Mitzenmacher Harvard University 33 Oxford St. Cambridge, MA 02138 michaelm@eecs.harvard.edu ABSTRACT A Bloom filter is a simple space-efficient randomized data structure
More informationPayload Inspection Using Parallel Bloom Filter in Dual Core Processor
Payload Inspection Using Parallel Bloom Filter in Dual Core Processor Arulanand Natarajan (Corresponding author) Anna University Coimbatore, TN, India E-mail: arulnat@yahoo.com S. Subramanian Sri Krishna
More informationCHAPTER 8. Copyright Cengage Learning. All rights reserved.
CHAPTER 8 RELATIONS Copyright Cengage Learning. All rights reserved. SECTION 8.3 Equivalence Relations Copyright Cengage Learning. All rights reserved. The Relation Induced by a Partition 3 The Relation
More informationHashing. 1. Introduction. 2. Direct-address tables. CmSc 250 Introduction to Algorithms
Hashing CmSc 250 Introduction to Algorithms 1. Introduction Hashing is a method of storing elements in a table in a way that reduces the time for search. Elements are assumed to be records with several
More informationWeek 9. Hash tables. 1 Generalising arrays. 2 Direct addressing. 3 Hashing in general. 4 Hashing through chaining. 5 Hash functions.
Week 9 tables 1 2 3 ing in ing in ing 4 ing 5 6 General remarks We continue data structures by discussing hash tables. For this year, we only consider the first four sections (not sections and ). Only
More informationIII Data Structures. Dynamic sets
III Data Structures Elementary Data Structures Hash Tables Binary Search Trees Red-Black Trees Dynamic sets Sets are fundamental to computer science Algorithms may require several different types of operations
More informationDictionary. Dictionary. stores key-value pairs. Find(k) Insert(k, v) Delete(k) List O(n) O(1) O(n) Sorted Array O(log n) O(n) O(n)
Hash-Tables Introduction Dictionary Dictionary stores key-value pairs Find(k) Insert(k, v) Delete(k) List O(n) O(1) O(n) Sorted Array O(log n) O(n) O(n) Balanced BST O(log n) O(log n) O(log n) Dictionary
More informationHashing and sketching
Hashing and sketching 1 The age of big data An age of big data is upon us, brought on by a combination of: Pervasive sensing: so much of what goes on in our lives and in the world at large is now digitally
More informationAlgorithm Analysis. (Algorithm Analysis ) Data Structures and Programming Spring / 48
Algorithm Analysis (Algorithm Analysis ) Data Structures and Programming Spring 2018 1 / 48 What is an Algorithm? An algorithm is a clearly specified set of instructions to be followed to solve a problem
More informationAlgorithms and Data Structures, or
Algorithms and Data Structures, or... Classical Algorithms of the 50s, 60s and 70s Mary Cryan A&DS Lecture 1 1 Mary Cryan Our focus Emphasis is Algorithms ( Data Structures less important). Most of the
More informationCS 350 Algorithms and Complexity
CS 350 Algorithms and Complexity Winter 2019 Lecture 12: Space & Time Tradeoffs. Part 2: Hashing & B-Trees Andrew P. Black Department of Computer Science Portland State University Space-for-time tradeoffs
More informationLesson n.11 Data Structures for P2P Systems: Bloom Filters, Merkle Trees
Lesson n.11 : Bloom Filters, Merkle Trees Didactic Material Tutorial on Moodle 15/11/2013 1 SET MEMBERSHIP PROBLEM Let us consider the set S={s 1,s 2,...,s n } of n elements chosen from a very large universe
More informationELEMENTARY NUMBER THEORY AND METHODS OF PROOF
CHAPTER 4 ELEMENTARY NUMBER THEORY AND METHODS OF PROOF Copyright Cengage Learning. All rights reserved. SECTION 4.3 Direct Proof and Counterexample III: Divisibility Copyright Cengage Learning. All rights
More information9.5 Equivalence Relations
9.5 Equivalence Relations You know from your early study of fractions that each fraction has many equivalent forms. For example, 2, 2 4, 3 6, 2, 3 6, 5 30,... are all different ways to represent the same
More informationDATA STRUCTURES/UNIT 3
UNIT III SORTING AND SEARCHING 9 General Background Exchange sorts Selection and Tree Sorting Insertion Sorts Merge and Radix Sorts Basic Search Techniques Tree Searching General Search Trees- Hashing.
More informationDifference Bloom Filter: a Probabilistic Structure for Multi-set Membership Query
Difference Bloom Filter: a Probabilistic Structure for Multi-set Membership Query Dongsheng Yang, Deyu Tian, Junzhi Gong, Siang Gao, Tong Yang, Xiaoming Li Department of Computer Secience, Peking University,
More informationFlexible Indexing Using Signatures
Flexible Indexing Using Signatures David Holmes 9517 Linden Avenue Bethesda, MD 20814 (240)426-1658 E-mail: holmesdo@aol.com Abstract This paper discusses an implementation of database signatures. Previous
More informationCLIP: A Compact, Load-balancing Index Placement Function
CLIP: A Compact, Load-balancing Index Placement Function Michael McThrow Storage Systems Research Center University of California, Santa Cruz Abstract Existing file searching tools do not have the performance
More informationCSE 215: Foundations of Computer Science Recitation Exercises Set #4 Stony Brook University. Name: ID#: Section #: Score: / 4
CSE 215: Foundations of Computer Science Recitation Exercises Set #4 Stony Brook University Name: ID#: Section #: Score: / 4 Unit 7: Direct Proof Introduction 1. The statement below is true. Rewrite the
More informationIntroduction. hashing performs basic operations, such as insertion, better than other ADTs we ve seen so far
Chapter 5 Hashing 2 Introduction hashing performs basic operations, such as insertion, deletion, and finds in average time better than other ADTs we ve seen so far 3 Hashing a hash table is merely an hashing
More informationResearch Students Lecture Series 2015
Research Students Lecture Series 215 Analyse your big data with this one weird probabilistic approach! Or: applied probabilistic algorithms in 5 easy pieces Advait Sarkar advait.sarkar@cl.cam.ac.uk Research
More informationCh 3.4 The Integers and Division
Integers and Division 1 Ch 3.4 The Integers and Division This area of discrete mathematics belongs to the area of Number Theory. Some applications of the concepts in this section include generating pseudorandom
More informationBloom Filter for Network Security Alex X. Liu & Haipeng Dai
Bloom Filter for Network Security Alex X. Liu & Haipeng Dai haipengdai@nju.edu.cn 313 CS Building Department of Computer Science and Technology Nanjing University Bloom Filters Given a set S = {x 1,x 2,x
More informationA Robust Bloom Filter
A Robust Bloom Filter Yoon-Hwa Choi Department of Computer Engineering, Hongik University, Seoul, Korea. Orcid: 0000-0003-4585-2875 Abstract A Bloom filter is a space-efficient randomized data structure
More informationSummary Cache: A Scalable Wide-Area Web Cache Sharing Protocol
IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 8, NO. 3, JUNE 2000 281 Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol Li Fan, Member, IEEE, Pei Cao, Jussara Almeida, and Andrei Z. Broder Abstract
More informationChapter 5 Hashing. Introduction. Hashing. Hashing Functions. hashing performs basic operations, such as insertion,
Introduction Chapter 5 Hashing hashing performs basic operations, such as insertion, deletion, and finds in average time 2 Hashing a hash table is merely an of some fixed size hashing converts into locations
More informationBackground on Bloom Filter
CSE 535 : Lecture 5 String Matching with Bloom Filters Washington University Fall 23 http://www.arl.wustl.edu/arl/projects/fpx/cse535/ Copyright 23, Sarang Dharmapurikar [Guest Lecture] CSE 535 : Fall
More informationHash Table and Hashing
Hash Table and Hashing The tree structures discussed so far assume that we can only work with the input keys by comparing them. No other operation is considered. In practice, it is often true that an input
More informationConsider the actions taken on positive integers when considering decimal values shown in Table 1 where the division discards the remainder.
9.3 Mapping Down to 0,..., M 1 In our previous step, we discussed methods for taking various objects and deterministically creating a 32- bit hash value based on the properties of the object. Hash tables,
More informationExcerpt from "Art of Problem Solving Volume 1: the Basics" 2014 AoPS Inc.
Chapter 5 Using the Integers In spite of their being a rather restricted class of numbers, the integers have a lot of interesting properties and uses. Math which involves the properties of integers is
More informationChapter 6 Random Number Generation
Chapter 6 Random Number Generation Requirements / application Pseudo-random bit generator Hardware and software solutions [NetSec/SysSec], WS 2007/2008 6.1 Requirements and Application Scenarios Security
More informationStanford University CS359G: Graph Partitioning and Expanders Handout 18 Luca Trevisan March 3, 2011
Stanford University CS359G: Graph Partitioning and Expanders Handout 8 Luca Trevisan March 3, 20 Lecture 8 In which we prove properties of expander graphs. Quasirandomness of Expander Graphs Recall that
More informationEfficiently decodable insertion/deletion codes for high-noise and high-rate regimes
Efficiently decodable insertion/deletion codes for high-noise and high-rate regimes Venkatesan Guruswami Carnegie Mellon University Pittsburgh, PA 53 Email: guruswami@cmu.edu Ray Li Carnegie Mellon University
More informationHashing. Introduction to Data Structures Kyuseok Shim SoEECS, SNU.
Hashing Introduction to Data Structures Kyuseok Shim SoEECS, SNU. 1 8.1 INTRODUCTION Binary search tree (Chapter 5) GET, INSERT, DELETE O(n) Balanced binary search tree (Chapter 10) GET, INSERT, DELETE
More informationMATH 54 - LECTURE 4 DAN CRYTSER
MATH 54 - LECTURE 4 DAN CRYTSER Introduction In this lecture we review properties and examples of bases and subbases. Then we consider ordered sets and the natural order topology that one can lay on an
More informationOne Memory Access Bloom Filters and Their Generalization
This paper was presented as part of the main technical program at IEEE INFOCOM 211 One Memory Access Bloom Filters and Their Generalization Yan Qiao Tao Li Shigang Chen Department of Computer & Information
More informationBloom Filters and its Variants
Bloom Filters and its Variants Original Author: Deke Guo Revised by Haipeng Dai Nanjing University 2015.11.27 Page 1 Outline 1. Standard Bloom Filters 2. Compressed Bloom Filters 3. Counting Bloom Filters
More informationIntroducing Hashing. Chapter 21. Copyright 2012 by Pearson Education, Inc. All rights reserved
Introducing Hashing Chapter 21 Contents What Is Hashing? Hash Functions Computing Hash Codes Compressing a Hash Code into an Index for the Hash Table A demo of hashing (after) ARRAY insert hash index =
More informationHash Table. A hash function h maps keys of a given type into integers in a fixed interval [0,m-1]
Exercise # 8- Hash Tables Hash Tables Hash Function Uniform Hash Hash Table Direct Addressing A hash function h maps keys of a given type into integers in a fixed interval [0,m-1] 1 Pr h( key) i, where
More informationLecture 17. Improving open-addressing hashing. Brent s method. Ordered hashing CSE 100, UCSD: LEC 17. Page 1 of 19
Lecture 7 Improving open-addressing hashing Brent s method Ordered hashing Page of 9 Improving open addressing hashing Recall the average case unsuccessful and successful find time costs for common openaddressing
More informationQuestion Score Points Out Of 25
University of Texas at Austin 6 May 2005 Department of Computer Science Theory in Programming Practice, Spring 2005 Test #3 Instructions. This is a 50-minute test. No electronic devices (including calculators)
More informationOn the Cell Probe Complexity of Dynamic Membership or
On the Cell Probe Complexity of Dynamic Membership or Can We Batch Up Updates in External Memory? Ke Yi and Qin Zhang Hong Kong University of Science & Technology SODA 2010 Jan. 17, 2010 1-1 The power
More informationAdvanced Algorithmics (6EAP) MTAT Hashing. Jaak Vilo 2016 Fall
Advanced Algorithmics (6EAP) MTAT.03.238 Hashing Jaak Vilo 2016 Fall Jaak Vilo 1 ADT asscociative array INSERT, SEARCH, DELETE An associative array (also associative container, map, mapping, dictionary,
More informationELEMENTARY NUMBER THEORY AND METHODS OF PROOF
CHAPTER 4 ELEMENTARY NUMBER THEORY AND METHODS OF PROOF Copyright Cengage Learning. All rights reserved. SECTION 4.3 Direct Proof and Counterexample III: Divisibility Copyright Cengage Learning. All rights
More informationScalable Enterprise Networks with Inexpensive Switches
Scalable Enterprise Networks with Inexpensive Switches Minlan Yu minlanyu@cs.princeton.edu Princeton University Joint work with Alex Fabrikant, Mike Freedman, Jennifer Rexford and Jia Wang 1 Enterprises
More informationFabian Kuhn. Nicla Bernasconi, Dan Hefetz, Angelika Steger
Algorithms and Lower Bounds for Distributed Coloring Problems Fabian Kuhn Parts are joint work with Parts are joint work with Nicla Bernasconi, Dan Hefetz, Angelika Steger Given: Network = Graph G Distributed
More informationID Bloom Filter: Achieving Faster Multi-Set Membership Query in Network Applications
ID Bloom Filter: Achieving Faster Multi-Set Membership Query in Network Applications Peng Liu 1, Hao Wang 1, Siang Gao 1, Tong Yang 1, Lei Zou 1, Lorna Uden 2, Xiaoming Li 1 Peking University, China 1
More informationCompetitive analysis of aggregate max in windowed streaming. July 9, 2009
Competitive analysis of aggregate max in windowed streaming Elias Koutsoupias University of Athens Luca Becchetti University of Rome July 9, 2009 The streaming model Streaming A stream is a sequence of
More informationIndexing and Searching
Indexing and Searching Introduction How to retrieval information? A simple alternative is to search the whole text sequentially Another option is to build data structures over the text (called indices)
More informationHashing. Yufei Tao. Department of Computer Science and Engineering Chinese University of Hong Kong
Department of Computer Science and Engineering Chinese University of Hong Kong In this lecture, we will revisit the dictionary search problem, where we want to locate an integer v in a set of size n or
More informationCS 161 Problem Set 4
CS 161 Problem Set 4 Spring 2017 Due: May 8, 2017, 3pm Please answer each of the following problems. Refer to the course webpage for the collaboration policy, as well as for helpful advice for how to write
More informationHashing. Hashing Procedures
Hashing Hashing Procedures Let us denote the set of all possible key values (i.e., the universe of keys) used in a dictionary application by U. Suppose an application requires a dictionary in which elements
More informationDegree Optimal Deterministic Routing for P2P Systems
Degree Optimal Deterministic Routing for P2P Systems Gennaro Cordasco Luisa Gargano Mikael Hammar Vittorio Scarano Abstract We propose routing schemes that optimize the average number of hops for lookup
More informationDiscrete Mathematics Lecture 4. Harper Langston New York University
Discrete Mathematics Lecture 4 Harper Langston New York University Sequences Sequence is a set of (usually infinite number of) ordered elements: a 1, a 2,, a n, Each individual element a k is called a
More informationCryptographic Hash Functions
ECE458 Winter 2013 Cryptographic Hash Functions Dan Boneh (Mods by Vijay Ganesh) Previous Lectures: What we have covered so far in cryptography! One-time Pad! Definition of perfect security! Block and
More informationToday: Finish up hashing Sorted Dictionary ADT: Binary search, divide-and-conquer Recursive function and recurrence relation
Announcements HW1 PAST DUE HW2 online: 7 questions, 60 points Nat l Inst visit Thu, ok? Last time: Continued PA1 Walk Through Dictionary ADT: Unsorted Hashing Today: Finish up hashing Sorted Dictionary
More informationData Streams. Everything Data CompSci 216 Spring 2018
Data Streams Everything Data CompSci 216 Spring 2018 How much data is generated every 2 minute in the world? haps://fossbytes.com/how-much-data-is-generated-every-minute-in-the-world/ 3 Data stream A potentially
More informationGeneral Idea. Key could be an integer, a string, etc e.g. a name or Id that is a part of a large employee structure
Hashing 1 Hash Tables We ll discuss the hash table ADT which supports only a subset of the operations allowed by binary search trees. The implementation of hash tables is called hashing. Hashing is a technique
More informationHashTable CISC5835, Computer Algorithms CIS, Fordham Univ. Instructor: X. Zhang Fall 2018
HashTable CISC5835, Computer Algorithms CIS, Fordham Univ. Instructor: X. Zhang Fall 2018 Acknowledgement The set of slides have used materials from the following resources Slides for textbook by Dr. Y.
More informationCS 3410 Ch 20 Hash Tables
CS 341 Ch 2 Hash Tables Sections 2.1-2.7 Pages 773-82 2.1 Basic Ideas 1. A hash table is a data structure that supports insert, remove, and find in constant time, but there is no order to the items stored.
More informationLecture 5: Data Streaming Algorithms
Great Ideas in Theoretical Computer Science Summer 2013 Lecture 5: Data Streaming Algorithms Lecturer: Kurt Mehlhorn & He Sun In the data stream scenario, the input arrive rapidly in an arbitrary order,
More informationFast Approximate Reconciliation of Set Differences
Boston University OpenBU Computer Science http://open.bu.edu CAS: Computer Science: Technical Reports 2002 Fast Approximate Reconciliation of Set Differences Byers, John Boston University Computer Science
More informationFalse Rate Analysis of Bloom Filter Replicas in Distributed Systems
False Rate Analysis of Bloom Filter Replicas in Distributed Systems Yifeng Zhu Electrical and Computer Engineering University of Maine zhu@eece.maine.edu Hong Jiang Computer Science and Engineering University
More informationThe Probabilistic Method
The Probabilistic Method Po-Shen Loh June 2010 1 Warm-up 1. (Russia 1996/4 In the Duma there are 1600 delegates, who have formed 16000 committees of 80 persons each. Prove that one can find two committees
More informationCryptographically Secure Bloom-Filters
131 139 Cryptographically Secure Bloom-Filters Ryo Nojima, Youki Kadobayashi National Institute of Information and Communications Technology (NICT), 4-2-1 Nukuikitamachi, Koganei, Tokyo, 184-8795, Japan.
More informationSwitch and Router Design. Packet Processing Examples. Packet Processing Examples. Packet Processing Rate 12/14/2011
// Bottlenecks Memory, memory, 88 - Switch and Router Design Dr. David Hay Ross 8b dhay@cs.huji.ac.il Source: Nick Mckeown, Isaac Keslassy Packet Processing Examples Address Lookup (IP/Ethernet) Where
More informationKathleen Durant PhD Northeastern University CS Indexes
Kathleen Durant PhD Northeastern University CS 3200 Indexes Outline for the day Index definition Types of indexes B+ trees ISAM Hash index Choosing indexed fields Indexes in InnoDB 2 Indexes A typical
More informationCHAPTER 4 BLOOM FILTER
54 CHAPTER 4 BLOOM FILTER 4.1 INTRODUCTION Bloom filter was formulated by Bloom (1970) and is used widely today for different purposes including web caching, intrusion detection, content based routing,
More informationError Detection and Correction by using Bloom Filters R. Prem Kumar, Smt. V. Annapurna
Error Detection and Correction by using Bloom Filters R. Prem Kumar, Smt. V. Annapurna Abstract---Bloom filters (BFs) provide a fast and efficient way to check whether a given element belongs to a set.
More informationGraceful Labeling for Some Star Related Graphs
International Mathematical Forum, Vol. 9, 2014, no. 26, 1289-1293 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/imf.2014.4477 Graceful Labeling for Some Star Related Graphs V. J. Kaneria, M.
More informationIntroduction to Randomized Algorithms
Introduction to Randomized Algorithms Gopinath Mishra Advanced Computing and Microelectronics Unit Indian Statistical Institute Kolkata 700108, India. Organization 1 Introduction 2 Some basic ideas from
More informationOrdered Indices To gain fast random access to records in a file, we can use an index structure. Each index structure is associated with a particular search key. Just like index of a book, library catalog,
More informationData structures. Organize your data to support various queries using little time and/or space
Data structures Organize your data to support various queries using little time and/or space Given n elements A[1..n] Support SEARCH(A,x) := is x in A? Trivial solution: scan A. Takes time Θ(n) Best possible
More informationFile Size Distribution on UNIX Systems Then and Now
File Size Distribution on UNIX Systems Then and Now Andrew S. Tanenbaum, Jorrit N. Herder*, Herbert Bos Dept. of Computer Science Vrije Universiteit Amsterdam, The Netherlands {ast@cs.vu.nl, jnherder@cs.vu.nl,
More informationPACKET classification is a key function in modern routers
2120 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 30, NO. 11, NOVEMBER 2018 EMOMA: Exact Match in One Memory Access Salvatore Pontarelli, Pedro Reviriego, and Michael Mitzenmacher Abstract
More informationNetwork Applications of Bloom Filters: A Survey
Internet Mathematics Vol. 1, No. 4: 485-509 Network Applications of Bloom Filters: A Survey Andrei Broder and Michael Mitzenmacher Abstract. ABloomfilter is a simple space-e cient randomized data structure
More informationNgày 9 tháng 12 năm Discrete Mathematics Lecture-15
Discrete Mathematics Lecture-15 Ngày 9 tháng 12 năm 2011 ex ex ex a 1 mod b (gcd(a,b) = 1) ex a 1 mod b (gcd(a,b) = 1) Returns an integer c < b such that a c mod b = 1. ex a 1 mod b (gcd(a,b) = 1) Returns
More informationCpt S 223. School of EECS, WSU
Hashing & Hash Tables 1 Overview Hash Table Data Structure : Purpose To support insertion, deletion and search in average-case constant t time Assumption: Order of elements irrelevant ==> data structure
More informationCourse : Data mining
Course : Data mining Lecture : Mining data streams Aristides Gionis Department of Computer Science Aalto University visiting in Sapienza University of Rome fall 2016 reading assignment LRU book: chapter
More informationEfficient Decentralized Algorithms for the Distributed Trigger Counting Problem
Efficient Decentralized Algorithms for the Distributed Trigger Counting Problem Venkatesan T. Chakaravarthy 1, Anamitra R. Choudhury 1, Vijay K. Garg 2, and Yogish Sabharwal 1 1 IBM Research - India, New
More informationSubway : Peer-To-Peer Clustering of Clients for Web Proxy
Subway : Peer-To-Peer Clustering of Clients for Web Proxy Kyungbaek Kim and Daeyeon Park Department of Electrical Engineering & Computer Science, Division of Electrical Engineering, Korea Advanced Institute
More informationAcknowledgement HashTable CISC4080, Computer Algorithms CIS, Fordham Univ.
Acknowledgement HashTable CISC4080, Computer Algorithms CIS, Fordham Univ. Instructor: X. Zhang Spring 2018 The set of slides have used materials from the following resources Slides for textbook by Dr.
More informationCS1020 Data Structures and Algorithms I Lecture Note #15. Hashing. For efficient look-up in a table
CS1020 Data Structures and Algorithms I Lecture Note #15 Hashing For efficient look-up in a table Objectives 1 To understand how hashing is used to accelerate table lookup 2 To study the issue of collision
More informationOne-Pass Streaming Algorithms
One-Pass Streaming Algorithms Theory and Practice Complaints and Grievances about theory in practice Disclaimer Experiences with Gigascope. A practitioner s perspective. Will be using my own implementations,
More informationA TLV-Structured Data Naming Scheme for Content- Oriented Networking
A TLV-Structured Data Naming Scheme for Content- Oriented Networking Hang Liu InterDigital Communications, LLC 781 Third Avenue King of Prussia, PA 19406 Dan Zhang WINLAB, Rutgers University 671 Route
More informationFigure 1: An example of a hypercube 1: Given that the source and destination addresses are n-bit vectors, consider the following simple choice of rout
Tail Inequalities Wafi AlBalawi and Ashraf Osman Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, WV fwafi,osman@csee.wvu.edug 1 Routing in a Parallel Computer
More informationMain-Memory Databases 1 / 25
1 / 25 Motivation Hardware trends Huge main memory capacity with complex access characteristics (Caches, NUMA) Many-core CPUs SIMD support in CPUs New CPU features (HTM) Also: Graphic cards, FPGAs, low
More information