* Benchmark Merkle root computation
* Refactor SHA256 code
* Specialized double sha256 for 64 byte inputs
* Use SHA256D64 in Merkle root computation
* 4-way SSE4.1 implementation for double SHA256 on 64-byte inputs
* 8-way AVX2 implementation for double SHA256 on 64-byte inputs
* [MOVEONLY] Move unused Merkle branch code to tests
* Enable double-SHA256-for-64-byte code on 32-bit x86
* For AVX2 code, also check for AVX, XSAVE, and OS support
This is a backport of Core PR13191, PR13393 and PR13471
Depends on D1844 and D1845