A Bloom filter 3 is a data structure for representing a set, with some probability of false…
A Bloom filter3 is a data structure for representing a set, with some probability of false positives but with no false negatives. That is, the data structure is used to answer questions of the form “is x ∈ S?,” and there is some likelihood that the answer will be that x ∈ S when in fact x ∈/ S, but there is no possibility that the answer will be that x ∈/ S when in fact x ∈ S. As a tradeoff against the possibility of false positives, a Bloom filter can be much smaller than the set it represents.
A Bloom filter consists of a bit array of length m, together with k hash functions h1, …, hk (“hash functions” were introduced in Problem 1.13). Each hash function hj maps a set element (a string, perhaps) to a position in the array, called its hash value. The functions hj assign elements to positions in such a way that (1) each element has equal probability of being mapped to any of the m possible positions; (2) the hash values of any hash function for different elements are independent of each other; and (3) the hash functions are themselves independent of each other. Initially, all m positions in the bit array are 0. To insert an element e, compute the hash values h1(e), …, hk(e) and set those bit positions to 1. Then to check whether an element e is in the set, compute h1(e), …, hk(e); if at least one position hj(e) has value 0, e is absent; if all of the positions hj(e) have value 1, e is probably present. The memory efficiency of a Bloom filter comes because an element is represented by the distribution of positions that are the hash values of the hash functions. The element itself is not stored at all.
Suppose n elements have been inserted into a Bloom filter, and we wish to check for the presence of an element e’that has not been inserted. Calculate the probability of a false positive, in terms of m, k, and n. What is this probability if m = 1000, k = 10, and n = 100?