Page MenuHomePhabricator

mempool: Add topological index, enforce consistency, updated reorg logic
ClosedPublic

Authored by Fabien on Feb 15 2023, 16:49.

Details

Summary
In order to ween ourselves off of the quadratic stats, we must first
make a few changes.

- Introduce a new topological index, based off a new property of
  `CTxMemPoolEntry`, `entryId`. This is a monotonically increasing and
  unique value (much how `NodeId` works in the network layer).
- This index paves the way for us to remove the quadratic ancestor stats.
- This index is now used for relay (and the ancestor score sorter was removed).
- Reorg logic has been redone and simplified -- the performance gains
  from this are not yet fully realized but will be realized in future commits.
  - No more inconsistent mempools on reorg! To accomplish this we leverage
    the `DisconnectedBlockTransactions` pool as a topological sorter and we
    toss all mempool tx's into it on reorg, temporarily clearing the mempool.
  - Unwound blocks get added to this disconnected tx pool as before.
  - Unwinding farther than 10 blocks does not resurrect block tx's >10 blocks
    ago (same as before).
  - When unwinding is done, the tx's in the disconnect pool are added back
    to the mempool via the central and well tested ATMP code path, in
    topological order (care is taken to preserve any original entry timestamps
    and fee deltas that we know of).
  - Note that the disconnect pool has a default size limit that's 20x excessive
    block size, or 640MB (this behavior is unchanged by this MR).
  - Previously the mempool was being "updated for reorg" with each unwound block
    using a very expensive scan of the entire mempool, re-checking each tx
    against the tip. On very long reorgs this could get very costly.
    There is no need to do this for each unwound block, each time.
    It can be done once at the end "naturally" by calling ATMP.
    On top of having a performance benefit -- this has the benefit that code
    that checks tx sanity for mempool now lives in only 1 place, rather than two
    (note how we were able to delete a bunch of redundant code).
  - This change ensures:
    - That the topological index "entry id" is always correct
    - Invariants are never violated -- the mempool is consistent at all times.
    - Allowed us to delete expensive cleanup/reconciliation code that
      existed in `CTxMemPool` (which was used on reorg only).
- Implemented a "TODO" that was left in `addUnchecked` from Core.
- Updated comments

[..]

You won't see too significant a speedup yet this commit, but in future commits
the work we do here will ensure this gets faster.

Port of bchn#1128 and bchn#1158.

Depends on D13123.

This introduces 2 subtle changes that have not been catched by bchn due
to the lack of functional tests for these features:

  • The validation callback for tx removal is now called during a block disconnect (the mempool is cleared in DisconnectedBlock::importMemPool) which triggers new ZMQ messages. This is actually an improvement over the previous iteration because a transaction evicted from the mempool due to missing parent in the event of a reorg previously had no message. The test has been updated accordingly.
  • This removes an optimization that allowed the mempool descendant count to temporarily overflow during a reorg (see https://github.com/bitcoin/bitcoin/pull/21464#pullrequestreview-636049781). The mempool_updatefromblock.py test has been updated to increase the limit so that the test behavior remains the same but no overflow occurs(the descendant limit is not what is tested here).

Another difference is that we used to have a guard to prevent large
reorgs from causing an OOM due to the unbounded validation queue (see
D7213). However due to locking requirements we cannot apply this patch
anymore so it's left as a TODO.

Note that the mempool check() method has been updated to undo some of
the changes from D12173, since this diff breaks the ordering assumption.

Test Plan
ninja all check-extended

Run the sanitizer builds.

Diff Detail

Repository
rABC Bitcoin ABC
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

Fabien requested review of this revision.Feb 15 2023, 16:49
sdulfari requested changes to this revision.Feb 15 2023, 19:32
sdulfari added a subscriber: sdulfari.

first pass

src/txmempool.h
925 ↗(On Diff #37905)

why?

src/validation.cpp
3289 ↗(On Diff #37905)

Can be another diff but this can be moved to util

3339 ↗(On Diff #37905)

Just an observation for now: we are moving to locking cs_main for this whole function. This makes ParkBlock() more useful in contexts where cs_main is already locked (like ActivateBestChainStep). I ran into this issue a few times while designing the miner fund policy changes.

We should consider if flushing the validation queue is the responsibility of this function or the caller. For the latter, a wrapper can handle locking and occasionally flush the queue when handling unusually large reorgs done by RPC.

3387–3395 ↗(On Diff #37905)

Move this out of the loop since it does not loop.

This revision now requires changes to proceed.Feb 15 2023, 19:32
src/txmempool.h
925 ↗(On Diff #37905)

Because it's done in the source material... This is the same behavior as the previous implementation but I suppose it saves a couple instructions since the iterator is not deferred. Looks like moot optimization to me.
Actually I will revert because it creates an assumption about the inner type of queuedTx which wasn't the case before, so it's more brittle now.

src/validation.cpp
3289 ↗(On Diff #37905)

Agreed. I have several minor improvements on my todo list already but I want to complete the feature first, so I'm leaving this for another diff

3339 ↗(On Diff #37905)

I don't think we're moving toward more cs_main locking but rather the opposite as it's a key parameter for performance improvements.

We should consider if flushing the validation queue is the responsibility of this function or the caller.

It's not about flushing but preventing it from growing due to the successive calls to DisconnectedTip(). Since this is where the loop resides you can't move it to the callsite.

3387–3395 ↗(On Diff #37905)

I considered this but you don't want to run this if pindex is not in the chain, as you won't reorg anything but clear/refill your mempool for no reason, triggering the validation interface spuriously. Even if it's not a big deal it's better to avoid it if possible.

Don't assume the queueTx type

Fabien edited the summary of this revision. (Show Details)

Don't drop unprioritized txs during reorgs

This revision is now accepted and ready to land.Feb 20 2023, 21:43