Page MenuHomePhabricator

[avalanche] use the OrphanProofPool
ClosedPublic

Authored by deadalnix on May 5 2021, 08:32.

Details

Reviewers
majcosta
Fabien
PiRK
Group Reviewers
Restricted Project
Commits
rABCc565a795866a: [avalanche] use the OrphanProofPool
Summary

Plug the orphan proofs pool into PeerManager and start using it to store proofs with missing utxos or wrong UTXOs that could potentially become good in case of a reorg.
Re-check all proofs at each chaintip change, create a new peer for proofs that have become good, remove them from the orphan pool.

Regarding the choice of setting the pool capacity to 10000 stakes, I ran a benchmark and found that I can verify 10000 single-stake proofs in 0.39s. It seems reasonable to keep this number below 1 second if we want to check all the proofs at each tip change.

Test Plan

ninja all check-all

Event Timeline

PiRK requested review of this revision.May 5 2021, 08:32

Failed tests logs:

====== Bitcoin ABC functional tests: abc_mining_basic.py ======

------- Stdout: -------
2021-05-05T08:41:33.646000Z TestFramework (INFO): Initializing test directory /work/abc-ci-builds/build-debug/test/tmp/test_runner_₿₵_  _20210505_083745/abc_mining_basic_540
2021-05-05T08:41:34.217000Z TestFramework (ERROR): Assertion failed
Traceback (most recent call last):
  File "/work/test/functional/test_framework/test_framework.py", line 126, in main
    self.run_test()
  File "/work/test/functional/abc_mining_basic.py", line 146, in run_test
    self.run_for_node(self.nodes[1], MINER_FUND_LEGACY_ADDR)
  File "/work/test/functional/abc_mining_basic.py", line 69, in run_for_node
    'minimumvalue': 0,
  File "/work/test/functional/abc_mining_basic.py", line 58, in assert_getblocktemplate
    assert_equal(blockTemplate[key], value)
  File "/work/test/functional/test_framework/util.py", line 60, in assert_equal
    for arg in (thing1, thing2) + args)))
AssertionError: not({'minerfund': {'addresses': ['2MviGxxFciGeWTgkUgYgjqehWt18c4ZsShd'], 'minimumvalue': 200000000}} == {'minerfund': {'addresses': [], 'minimumvalue': 0}})
2021-05-05T08:41:34.267000Z TestFramework (INFO): Stopping nodes
2021-05-05T08:41:34.370000Z TestFramework (WARNING): Not cleaning up dir /work/abc-ci-builds/build-debug/test/tmp/test_runner_₿₵_  _20210505_083745/abc_mining_basic_540
2021-05-05T08:41:34.370000Z TestFramework (ERROR): Test failed. Test logging available at /work/abc-ci-builds/build-debug/test/tmp/test_runner_₿₵_  _20210505_083745/abc_mining_basic_540/test_framework.log
2021-05-05T08:41:34.370000Z TestFramework (ERROR): 
2021-05-05T08:41:34.370000Z TestFramework (ERROR): Hint: Call /work/test/functional/combine_logs.py '/work/abc-ci-builds/build-debug/test/tmp/test_runner_₿₵_  _20210505_083745/abc_mining_basic_540' to consolidate all logs
2021-05-05T08:41:34.370000Z TestFramework (ERROR): 
2021-05-05T08:41:34.370000Z TestFramework (ERROR): If this failure happened unexpectedly or intermittently, please file a bug and provide a link or upload of the combined log.
2021-05-05T08:41:34.370000Z TestFramework (ERROR): https://github.com/Bitcoin-ABC/bitcoin-abc/issues
2021-05-05T08:41:34.370000Z TestFramework (ERROR):

Each failure log is accessible here:
Bitcoin ABC functional tests: abc_mining_basic.py

majcosta added inline comments.
src/avalanche/orphanproofpool.cpp
45–49 ↗(On Diff #28357)

tying up a class' public API to the underlying implementation is a bad idea, why not just move part of the code in PeerManager::updatedBlockTip() to methods in the OrphanProofPool class?

something like "void orphanProof(Proof p)" or something, and let the pool worry about how it goes about doing it

src/avalanche/test/orphanproofpool_tests.cpp
166–179 ↗(On Diff #28357)

this seems a bit excessive, it seems to be testing whether std::vector and ranged_for loops work

src/avalanche/test/peermanager_tests.cpp
451–453 ↗(On Diff #28357)

how likely is it that two GetRandHash() called in sequence will collide?

PiRK added inline comments.
src/avalanche/peermanager.cpp
16 ↗(On Diff #28357)
17 ↗(On Diff #28357)

Suggested by @Fabien

deadalnix requested changes to this revision.May 5 2021, 15:39
deadalnix added a subscriber: deadalnix.
deadalnix added inline comments.
src/avalanche/peermanager.cpp
235 ↗(On Diff #28357)

Do we need a copy here?

More importantly, if fetchOrCreatePeer is responsible to decide if a proof is an orphan and take apropriate actions, then updatedBlockTip is unsound.

src/avalanche/test/orphanproofpool_tests.cpp
179 ↗(On Diff #28357)

testing that function is not unnecessary, but this once again doesn't test anything.

Are the elements iterated over the right ones? In the order we expect? This isn't testing the behavior that we expect fro this function.

This is getting very tiring. Not only this isn't the first time, but in D9425, not only this was an issue as well, but trimToMaximumSize was indeed bogous, all the tests did pass, because, they did not actually test what they were supposed to, but we had to go through somethign like 5 to 10 extra revision to get them sorted out.

204 ↗(On Diff #28357)

This whole test is pretty much testing nothing.

src/avalanche/test/peermanager_tests.cpp
453 ↗(On Diff #28357)

More importantly, is that behavior that need to be tested in this test? If this is a behavior is needed, then the test will fail, so, in order to avoid that, we fail the test? None of that makes any sense.

This revision now requires changes to proceed.May 5 2021, 15:39

Improve tests according to review. Enforce, document and test the sequential iteration over proof in forEachProof.
Fix the potential problem of updating the pool while iterating on it in updatedBlockTip.
Let fetchOrCreatePeer remove proofs from the orphan pool if they are found to be good.
Don't copy proofs in the temporary vectors created in updateBlockTip, use pointers instead.

This change does not satisfy me because it multiplicates the places where proofs are verified in the peermanager, and there are two places where proofs can be added to the orphan pool. I want to investigate alternative options.

PiRK planned changes to this revision.May 6 2021, 14:39
PiRK edited the summary of this revision. (Show Details)

change the logic for rescanning the orphan pool at tip update:
move orphan proofs to a temporary container, empty the orphan pool and let fetchOrCreatePeer check the proofs again and add them back to orphans if need be.

This removes OrphanProofPool::forEachProof and introduces OrphanProofPool::dump().

deadalnix requested changes to this revision.May 26 2021, 16:30
deadalnix added inline comments.
src/avalanche/peermanager.cpp
190 ↗(On Diff #28596)

If something doesn't use any state from that function, but use internal state from the orphanPool, do you:
a/ Copy the whole state from the orphan pool, process it here and then discard that state or
b/ Do it int he orphanPool?

Also, the move here is not useful, you are taking by reference so it does nothing.

229 ↗(On Diff #28596)

Does it matter?

481 ↗(On Diff #28596)

Maybe the orphanPool isn't a member of the PeerManager, then, if that it needed.

After all, the transaction's orphan pool is not a member of the mempool, so it's not like that would be very surprising.

This revision now requires changes to proceed.May 26 2021, 16:30

Make dump() an internal method of the orphan pool, add a public rescan method that takes a peermanager as a parameter, makes a temporary copy of the pool, clears the main container and calls PeerManager::fetchOrCreatePeer on all proofs (which adds the proofs back to the main container if they are still orphans).

Tail of the build log:

/work /work/abc-ci-builds/lint-circular-dependencies
A new circular dependency in the form of "avalanche/orphanproofpool -> avalanche/peermanager -> avalanche/orphanproofpool" appears to have been introduced.

/work/abc-ci-builds/lint-circular-dependencies
Build lint-circular-dependencies failed with exit code 1

fix circular dependency by replacing the PeerManager parameter by a generic lambda function in OrphanProofPool::rescan.

deadalnix requested changes to this revision.May 27 2021, 13:54
deadalnix added inline comments.
src/avalanche/orphanproofpool.cpp
52 ↗(On Diff #28636)

So, why is it necessary? I don't think it is.

54 ↗(On Diff #28636)

Why does this need to go through a function? Do we expect to do many different operations in there? It doesn't look like it from this patch.

src/avalanche/peermanager.cpp
16 ↗(On Diff #28636)

Is it the state that is an orphan?

src/avalanche/peermanager.h
194 ↗(On Diff #28636)

That seems to indicate that the orphan pool probably shouldn't be owned by the peer manager.

This revision now requires changes to proceed.May 27 2021, 13:54
deadalnix edited reviewers, added: PiRK; removed: deadalnix.
  • Reduce lock scope
  • Avoid unecessary copies of datastructures
  • Various simplifications

Tail of the build log:

/work /work/abc-ci-builds/lint-circular-dependencies
A new circular dependency in the form of "avalanche/orphanproofpool -> avalanche/peermanager -> avalanche/orphanproofpool" appears to have been introduced.

/work/abc-ci-builds/lint-circular-dependencies
Build lint-circular-dependencies failed with exit code 1

Failed tests logs:

====== Bitcoin ABC functional tests with the next upgrade activated: p2p_blockfilters.py ======

------- Stdout: -------
2021-06-01T15:21:44.207000Z TestFramework (INFO): Initializing test directory /work/abc-ci-builds/build-diff/test/tmp/test_runner_₿₵_  _20210601_152017/p2p_blockfilters_219
2021-06-01T15:21:48.790000Z TestFramework (INFO): get cfcheckpt on chain to be re-orged out.
2021-06-01T15:21:48.841000Z TestFramework (INFO): Reorg node 0 to a new chain.
2021-06-01T15:21:51.303000Z TestFramework (INFO): Check that peers can fetch cfcheckpt on active chain.
2021-06-01T15:21:51.354000Z TestFramework (ERROR): Assertion failed
Traceback (most recent call last):
  File "/work/test/functional/test_framework/test_framework.py", line 126, in main
    self.run_test()
  File "/work/test/functional/p2p_blockfilters.py", line 115, in run_test
    assert_equal(response.stop_hash, request.stop_hash)
  File "/work/test/functional/test_framework/util.py", line 60, in assert_equal
    for arg in (thing1, thing2) + args)))
AssertionError: not(30345662698139305969986565030255702256781599906099494278534549280382841861875 == 24994139922257106584563868156807671063594542952134545782081485019224417401219)
2021-06-01T15:21:51.405000Z TestFramework (INFO): Stopping nodes
2021-06-01T15:21:51.606000Z TestFramework (WARNING): Not cleaning up dir /work/abc-ci-builds/build-diff/test/tmp/test_runner_₿₵_  _20210601_152017/p2p_blockfilters_219
2021-06-01T15:21:51.606000Z TestFramework (ERROR): Test failed. Test logging available at /work/abc-ci-builds/build-diff/test/tmp/test_runner_₿₵_  _20210601_152017/p2p_blockfilters_219/test_framework.log
2021-06-01T15:21:51.606000Z TestFramework (ERROR): 
2021-06-01T15:21:51.607000Z TestFramework (ERROR): Hint: Call /work/test/functional/combine_logs.py '/work/abc-ci-builds/build-diff/test/tmp/test_runner_₿₵_  _20210601_152017/p2p_blockfilters_219' to consolidate all logs
2021-06-01T15:21:51.607000Z TestFramework (ERROR): 
2021-06-01T15:21:51.607000Z TestFramework (ERROR): If this failure happened unexpectedly or intermittently, please file a bug and provide a link or upload of the combined log.
2021-06-01T15:21:51.607000Z TestFramework (ERROR): https://github.com/Bitcoin-ABC/bitcoin-abc/issues
2021-06-01T15:21:51.607000Z TestFramework (ERROR):

Each failure log is accessible here:
Bitcoin ABC functional tests with the next upgrade activated: p2p_blockfilters.py

majcosta requested changes to this revision.Jun 2 2021, 01:11
majcosta added inline comments.
src/avalanche/orphanproofpool.cpp
50 ↗(On Diff #28707)
This revision now requires changes to proceed.Jun 2 2021, 01:11

forgot to comment. I think you need to clear the proofs container since std::move leaves it in a 'valid, but unspecified state'. I assume that means it could contain garbage or something.

otherwise lgtm

PiRK resigned from this revision.EditedJun 2 2021, 08:32

I'm resigning as a reviewer for this one because there are obviously too many things I don't understand, here. My understanding was that moving a variable makes it invalid, I don't understand how you can move the entire container and keep using it.
i also don't get how adding a new circular dependency is less bad than any of my other attemps at solving this.

Note: ignore my inline comments, those are old drafts that I never submitted and forgot to delete before this answer.

src/avalanche/orphanproofpool.cpp
52 ↗(On Diff #28636)

I tried finding a solution without doing this temporary dump, but I wasn't able to find one yet. The problem is that fetchOrCreateProof is modifying the container while we iterate over it.

54 ↗(On Diff #28636)

I couldn't find a better way to call fetchOrCreatePeer without causing a circular dependency (I tried passing a reference to the PeerManager)

clear the container explicitely

Fabien requested changes to this revision.Jun 2 2021, 15:14
Fabien added inline comments.
src/avalanche/orphanproofpool.cpp
7 ↗(On Diff #28722)

That doesn't make sense imo.
Can't you make the orphan pool a friend class to the peer manager and let him handle the rescan ? or just make the ProofContainer public ?

This revision now requires changes to proceed.Jun 2 2021, 15:14
deadalnix added inline comments.
src/avalanche/orphanproofpool.cpp
7 ↗(On Diff #28722)

Let's start with why doesn't it make sense?

src/avalanche/orphanproofpool.cpp
7 ↗(On Diff #28722)

Because this is introducing coupling between caller and callee for a single method which makes little to no sense in the context of its class. What does rescan means for an orphan proof pool ? If I didn't know the code I couldn't guess by reading the method.

In the end here is how I see the design, feel free to disagree: you want to maybe move proofs from orphan to valid. The orphan pool belongs to the peer manager, and so does the valid proofs. The conclusion is that the function belongs to the peer manager. As I see it the only reason you want to put it in the orphan pool is because you can't access the proof container, which is why I suggested others solutions that achieve that.

src/avalanche/orphanproofpool.cpp
7 ↗(On Diff #28722)

I agree it's not ideal, but alternative seems objectively worse. The coupling is lighter than any alternative you suggest for instance, making classes friend would couple the specific data type, while this doesn't expose any of the guts of any of the parts.

OK let's move on with this. Since the orphan proof is always accessed through the peer manager (tests excepted) if this become an issue it should be cheap enough to refactor.

This revision is now accepted and ready to land.Jun 4 2021, 11:49
This revision was automatically updated to reflect the committed changes.