HomePhabricator

[Chronik] Add `TxNumCache` to speed up `prepare_indexed_txs`

Description

[Chronik] Add TxNumCache to speed up prepare_indexed_txs

Summary:
Currently, prepare_indexed_txs is still one of the slowest parts of the indexer due to the disk access, even when using batched queries.

We add TxNumCache, which internally works like a "conveyor belt" of buckets that are filled with TxId -> TxNum.

Each block puts all their TxId -> TxNum tuples into the bucket at the front of the belt, one-by-one. If this bucket is full, the bucket at the end of the belt is "dropped off", emptied and moved to the front, moving all other buckets one step back. Then the new empty bucket will be filled, until it is full, etc.

The required memory will be 40B per TxId (32-byte txid + 8-byte TxNum, +small HashMap overhead), so a 100000 tx bucket will require 4MB; meaning 10 buckets require 40MB, which is the default. Users can configure the number of buckets (-chroniktxnumcachebuckets) and bucket size (-chroniktxnumcachebucketsize) to tune Chronik to their machine; the default of 40MB strikes a good balance of RAM usage and performance.

Benchmarks for the first 300000ish blocks show a significant speedup of 5.24x when using a bucket size of 100k (40MB RAM), and a speedup of 2.26x (4MB RAM) when using a bucket size of 10k. Based on this, a bucket size of 100k was chosen. One benchmark with 100 buckets and bucket size 100k (400MB RAM) showed a speedup of 7.4x, but RAM usage is excessive.

Using 5 buckets and a bucket size of 200k is slower than 10 buckets a 100k. Using 10 buckets a 200k barely speeds up indexing but uses 2x the RAM.

I also benchmarked using the faster seahash hashing algorithm for HashMap, but it made no big difference, so for now it is left at the slower but cryptographically secure default hash. This indicates that we're not compute but memory bound, which is to be expected. In the future, we might want to use a faster hashing algorithm.

bench300k total time [s]300k time prepare [s]
noprepare1066.850
old, before D155714375.393308.54
base3148.512081.66
d10-b10k1988.43921.58
d10-b25k1672.90606.05
d10-b100k1464.02397.17
d10-b200k1432.11365.26
d100-b10k1664.05597.20
d100-b100k1348.24281.39
d10-b10k-sea1907.64840.79
d10-b100k-sea1488.08421.23
d5-b200k1648.01581.16

For reference, a previous version of this used a simpler approach, where the cache simply stores the last depth_blocks worth of tx nums in a VecDeque of HashMaps, like a queue of blocks. The oldest block of txids was dropped off at the end, and the new block is added at the beginning. Benchmarks however showed this is slower than using the conveyor belt cache. Benchmarks are kept in this diff for future reference.

bench300k total time [s]300k time prepare [s]400k total time [s]400k time prepare [s]
no prepare1066.8503476.100
old, before D155714375.393308.54
D155713148.512081.6611453.997977.89
block queue, depth = 102749.211682.368595.975119.87
block queue, depth = 1002297.971231.127214.193738.08
block queue, depth = 10002969.341902.4910865.727389.62

Test Plan: ninja check-functional

Reviewers: Fabien, #bitcoin_abc

Reviewed By: Fabien, #bitcoin_abc

Differential Revision: https://reviews.bitcoinabc.org/D15619

Details

Provenance
tobias_ruckAuthored on Mar 5 2024, 17:16
tobias_ruckPushed on Mar 6 2024, 15:08
Reviewer
Restricted Project
Differential Revision
D15619: [Chronik] Add `TxNumCache` to speed up `prepare_indexed_txs`
Parents
rABC6218b29bc826: [guix] Make x86_64-w64-mingw32 builds reproducible
Branches
Unknown
Tags
Unknown