Differential D3356 Diff 9503 src/cuckoocache.h

Changeset View

Standalone View

src/cuckoocache.h

Show First 20 Lines • Show All 204 Lines • ▼ Show 20 Lines	private:
* a nonce).		* a nonce).
*/		*/
const Hash hash_function;		const Hash hash_function;

/**		/**
* compute_hashes is convenience for not having to write out this expression		* compute_hashes is convenience for not having to write out this expression
* everywhere we use the hash values of an Element.		* everywhere we use the hash values of an Element.
*		*
		* We need to map the 32-bit input hash onto a hash bucket in a range [0,
		* size) in a manner which preserves as much of the hash's uniformity as
		* possible. Ideally this would be done by bitmasking but the size is
		* usually not a power of two.
		*
		* The naive approach would be to use a mod -- which isn't perfectly uniform
		* but so long as the hash is much larger than size it is not that bad.
		* Unfortunately, mod/division is fairly slow on ordinary microprocessors
		* (e.g. 90-ish cycles on haswell, ARM doesn't even have an instruction for
		* it.); when the divisor is a constant the compiler will do clever tricks
		* to turn it into a multiply+add+shift, but size is a run-time value so the
		* compiler can't do that here.
		*
		* One option would be to implement the same trick the compiler uses and
		* compute the constants for exact division based on the size, as described
		* in "{N}-bit Unsigned Division via {N}-bit Multiply-Add" by Arch D.
		* Robison in 2005. But that code is somewhat complicated and the result is
		* still slower than other options:
		*
		* Instead we treat the 32-bit random number as a Q32 fixed-point number in
		* the range [0,1) and simply multiply it by the size. Then we just shift
		* the result down by 32-bits to get our bucket number. The results has
		* non-uniformity the same as a mod, but it is much faster to compute. More
		* about this technique can be found at
		* http://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/
		*
		* The resulting non-uniformity is also more equally distributed which would
		* be advantageous for something like linear probing, though it shouldn't
		* matter one way or the other for a cuckoo table.
		*
		* The primary disadvantage of this approach is increased intermediate
		* precision is required but for a 32-bit random number we only need the
		* high 32 bits of a 32*32->64 multiply, which means the operation is
		* reasonably fast even on a typical 32-bit processor.
		*
* @param e the element whose hashes will be returned		* @param e the element whose hashes will be returned
* @returns std::array<uint32_t, 8> of deterministic hashes derived from e		* @returns std::array<uint32_t, 8> of deterministic hashes derived from e
*/		*/
inline std::array<uint32_t, 8> compute_hashes(const Element &e) const {		inline std::array<uint32_t, 8> compute_hashes(const Element &e) const {
return {		return {
{uint32_t(		{uint32_t(
(hash_function.template operator()<0>(e) * uint64_t(size)) >>		(hash_function.template operator()<0>(e) * uint64_t(size)) >>
32),		32),
▲ Show 20 Lines • Show All 250 Lines • Show Last 20 Lines