Page MenuHomePhabricator

fix BIP37 processing for non-topologically ordered blocks
ClosedPublic

Authored by markblundeberg on Jun 19 2019, 23:57.

Details

Summary

CMerkleBlock::CMerkleBlock called IsRelevantAndUpdate() on each transaction
in order, which (due to the outpoint-adding feature) assumes a topological
order of transactions in order to work correctly. If an outpoint of interest were
created later in a block than when gets spent, then it would be added into
the bloom filter too late, and thus that spending transaction would get missed.

This changes CMerkleBlock::CMerkleBlock to scan all outputs first, then
make a second loop to scan all inputs. This requires breaking up the
IsRelevantAndUpdate() function into two parts. The original
IsRelevantAndUpdate() functionality remains fully intact however, as it
gets called from mempool-related code (mempool has topological order)
and tests.

Note that vMatchedTxn.push_back is moved into the second loop so that
index i keeps ascending order, in case that is somehow relevant. (the tests
at least do check this)

A two-loop construction like this will very slightly increase the false
positive rate.

Depends on D3399

Test Plan

make check

Diff Detail

Repository
rABC Bitcoin ABC
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

src/bloom.cpp
142 ↗(On Diff #9546)

Noted in telegram: "HasRelevantOutputsAndUpdate" is a poor name for this function because it does this too.

bvk added inline comments.
src/merkleblock.cpp
28 ↗(On Diff #9546)

How about we remove txid check in HasRelevantOutputsAndUpdate and adjust the call-site as follows:

for (size_t i = 0; i < block.vtx.size(); i++) {
  const bool matchID = filter.Contains(tx->GetId());
  const bool matchOutputs = filter.HasRelevantOutputsAndUpdate(*tx);
  vMatch.push_back(matchID||matchOutputs);
}

The new inline function IsRelevantAndUpdate could become:

bool IsRelevantAndUpdate(const CTransaction &tx) {
    const bool matchID = Contains(tx.GetId());
    const bool matchOutputs = HasRelevantOutputsAndUpdate(tx);
    if (matchID || matchOutputs) {
        return true;
    }
    return HasRelevantInputs(tx);
}

This is definitely longer -- I am not sure how strict we want to go for issues like these :)

src/merkleblock.cpp
28 ↗(On Diff #9546)

Actually, my version is buggy -- Outpoints are ignored when only TxID matches. Please ignore it :|

src/merkleblock.cpp
28 ↗(On Diff #9546)

Hmm I think it would work as you have written, since the HasRelevantOutputsAndUpdate is not short-circuited out. Anyway I'd prefer to stick with logic that I wrote (just to change minimally), and come up with a better name for HasRelevantOutputsAndUpdate :)

@bvk How about IsRelevantBesidesInputsAndUpdate? 😄

Alternatively it would work to move the txid check to the second function, making them:

  • HasRelevantOutputsAndUpdate
  • HasRelevantTxidOrInputs

However I like that the code changes to bloom.cpp are so minimal in the current diff. It's manifestly obvious that IsRelevantAndUpdate performs the exact same order of computations. Though, the changes you suggest make that fairly clear as well.

(Note that moving the txid check after the update for outputs is technically a behavioural change that also very slightly increases false detections, even for mempool.)

@bvk How about IsRelevantBesidesInputsAndUpdate? 😄

I think this is just fine.

This revision is now accepted and ready to land.Jun 21 2019, 14:34

change name to IsRelevantBesidesInputsAndUpdate, also clarify comments.

Returning for at least a rubber-stamp review from ABC member. :-)

deadalnix requested changes to this revision.Jun 22 2019, 23:26
deadalnix added inline comments.
src/bloom.cpp
130 ↗(On Diff #9563)

These names are completely obtuse.

MatchAndInsertOutputs or something is a much better name. Naming the function by what it doesn't do is really not helpful.

144 ↗(On Diff #9563)

That's probably out of scope, but this can be avoided in many cases by checking it at the end of the loop rather than now.

return fFound || contains(txid);
182 ↗(On Diff #9563)

MatchInputs

Whoever came up with the relevant terminology clearly has a very confusing mental model.

187 ↗(On Diff #9563)

This will early bail is the filter is full, but you'll go over all the inputs if the filter is empty, which is a waste of time. You should early bail in that case.

src/merkleblock.cpp
21 ↗(On Diff #9563)

You can use a C++11 style loop hereas you don't need the index.

26 ↗(On Diff #9563)
vMatch.push_back(filter.IsRelevantBesidesInputsAndUpdate(*tx));
32 ↗(On Diff #9563)

Presumably, the txid is a TxId.

35 ↗(On Diff #9563)

This whole thing is fairly inelegant.

if (!vMatch[i]) {
    vMatch[i] = filter.HasRelevantInputs(*tx);
}
if (vMatch[i]) {
    vMatchedTxn.push_back(std::make_pair(i, txid));
}

Something like this would be less strange that creating a temporary to immediately assign it and then use it.

src/test/bloom_tests.cpp
623 ↗(On Diff #9563)

size_t

624 ↗(On Diff #9563)

Braces

655 ↗(On Diff #9563)

Shuffle these check back in the right order and remove the "was" comments.

660 ↗(On Diff #9563)

braces and size_t

You'd benefit from doing a pass over these files that clearly aren't updated as a first step instead of propagating the bad stuff. It's not like we expect the block to have more than 4B transactions, but who knows, maybe one day, and writing correct code is not more expensive.

This revision now requires changes to proceed.Jun 22 2019, 23:26
src/bloom.cpp
187 ↗(On Diff #9563)

Ah yes, good catch!

src/test/bloom_tests.cpp
660 ↗(On Diff #9563)

Yep I'll do these in a separate pass.

BTW while changing, I realized maybe also lines like std::vector<unsigned int> vIndex; and the non-test code could be changed too. However, it may be a bit pointless since 32 bits for transaction count is baked into bip37:
https://github.com/Bitcoin-ABC/bitcoin-abc/blob/2f7b26a0003663e12fa6f157e8a129b75def8ede/src/merkleblock.h#L44
(in merkleblock.h you can also see lots of unsigned ints around)

I'm pretty sure bip37 will be long gone before we ever hit 4B transaction per block (as it will be way too much of a DoS vector then) but interesting to note ...

markblundeberg added inline comments.
src/bloom.cpp
144 ↗(On Diff #9563)

Yeah I thought the same, but indeed out of scope.

This revision is now accepted and ready to land.Jun 25 2019, 02:42