More efficient bitsliced rolling Bloom filter
This patch changes the implementation from one that stores 16 2-bit integers
in one uint32_t's, to one that stores the first bit of 64 2-bit integers in
one uint64_t and the second bit in another. This allows for 450x faster
refreshing and 2.2x faster average speed.