Page MenuHomePhabricator

Faster -reindex by initially deserializing only headers
ClosedPublic

Authored by PiRK on Apr 12 2024, 13:15.

Details

Reviewers
Fabien
Group Reviewers
Restricted Project
Commits
rABCf3f193270b70: Faster -reindex by initially deserializing only headers
Summary

When a block is initially read from a blk*.dat file during reindexing,
it can be added to the block index only if all of its ancestor blocks
have been added, which is rare. If the block's ancestors have not been
added, the block must be re-read from disk later when it can be added.

This commit: During the initial block read, deserialize only its header,
rather than the entire block, since this is sufficient to determine
if its parent (and thus all its ancestors) has been added. This is a
performance improvement.

Benchmark (to be compared to D15959):

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|       11,230,665.00 |               89.04 |    1.1% |      0.13 | `LoadExternalBlockFile`

More benchmark data from the PR description:

This reduces reindex time on mainnet by 7 hours on a Raspberry Pi, which translates to around a 25% reduction in the first part of reindexing (adding blocks to the index), and about a 6% reduction in overall reindex time.

This concludes backport of core#16981
https://github.com/bitcoin/bitcoin/pull/16981/commits/db929893ef0bc86ea2708cdbcf41152240cd7c73
Depends on D15960

Test Plan

ninja all check-all

Event Timeline

PiRK requested review of this revision.Apr 12 2024, 13:15
src/validation.cpp
5189–5191

This line would have been added in D15004 if we had backported core#24957 in the right order (after this one)

test/functional/test_framework/p2p.py
135–146

We use the same magic bytes as Core for disk serialization, but different ones for network messages. See D400.

This revision is now accepted and ready to land.Apr 12 2024, 14:00