Page MenuHomePhabricator

[avalanche] Remember nodes pending a proof and bind them when available
ClosedPublic

Authored by Fabien on Jun 25 2021, 12:52.

Details

Reviewers
PiRK
deadalnix
Group Reviewers
Restricted Project
Maniphest Tasks
Restricted Maniphest Task
Commits
rABCfb5cc4cd497d: [avalanche] Remember nodes pending a proof and bind them when available
Summary

This diff adds a structure for remembering the nodes waiting for a
proof, and bind them as soon as this proof is known (i.e. a peer with
this proof is created).

Ref T1634.

Test Plan
ninja all check-avalanche

Diff Detail

Event Timeline

Fabien published this revision for review.Jun 26 2021, 12:08
This revision is now accepted and ready to land.Jun 26 2021, 14:26
deadalnix requested changes to this revision.Jun 28 2021, 11:28
deadalnix added a subscriber: deadalnix.

You have a fundamental problem here. You can register a node for 2 proofs and 2 keys. In addition, the key seems duplicated between the PeerManager and the node state. Is it used at all in the peer manager? If not, then get rid of it. In any case, you need to resolve this duplication problem.

It seems like you'd need unit tests for this.

This revision now requires changes to proceed.Jun 28 2021, 11:28

Address feedback, rebase on top of D9720:

  • The pubkey is no longer duplicated
  • Use a multi_index for the pending nodes enforce the node id uniqueness
  • Split out the network layer binding/activation flag so the feature is tested via unit tests only in this diff

Failed tests logs:

====== Bitcoin ABC functional tests with the next upgrade activated: p2p_blockfilters.py ======

------- Stdout: -------
2021-07-05T14:43:54.641000Z TestFramework (INFO): Initializing test directory /work/abc-ci-builds/build-diff/test/tmp/test_runner_₿₵_  _20210705_144159/p2p_blockfilters_290
2021-07-05T14:43:59.769000Z TestFramework (INFO): get cfcheckpt on chain to be re-orged out.
2021-07-05T14:43:59.820000Z TestFramework (INFO): Reorg node 0 to a new chain.
2021-07-05T14:44:02.278000Z TestFramework (INFO): Check that peers can fetch cfcheckpt on active chain.
2021-07-05T14:44:02.331000Z TestFramework (ERROR): Assertion failed
Traceback (most recent call last):
  File "/work/test/functional/test_framework/test_framework.py", line 127, in main
    self.run_test()
  File "/work/test/functional/p2p_blockfilters.py", line 115, in run_test
    assert_equal(response.stop_hash, request.stop_hash)
  File "/work/test/functional/test_framework/util.py", line 60, in assert_equal
    for arg in (thing1, thing2) + args)))
AssertionError: not(20976074930306478592554250300701327872062515716428437594998550489965476602495 == 53636499989420933765690773610164576424255569450400882896522682575725428720027)
2021-07-05T14:44:02.381000Z TestFramework (INFO): Stopping nodes
2021-07-05T14:44:02.934000Z TestFramework (WARNING): Not cleaning up dir /work/abc-ci-builds/build-diff/test/tmp/test_runner_₿₵_  _20210705_144159/p2p_blockfilters_290
2021-07-05T14:44:02.934000Z TestFramework (ERROR): Test failed. Test logging available at /work/abc-ci-builds/build-diff/test/tmp/test_runner_₿₵_  _20210705_144159/p2p_blockfilters_290/test_framework.log
2021-07-05T14:44:02.934000Z TestFramework (ERROR): 
2021-07-05T14:44:02.934000Z TestFramework (ERROR): Hint: Call /work/test/functional/combine_logs.py '/work/abc-ci-builds/build-diff/test/tmp/test_runner_₿₵_  _20210705_144159/p2p_blockfilters_290' to consolidate all logs
2021-07-05T14:44:02.935000Z TestFramework (ERROR): 
2021-07-05T14:44:02.935000Z TestFramework (ERROR): If this failure happened unexpectedly or intermittently, please file a bug and provide a link or upload of the combined log.
2021-07-05T14:44:02.935000Z TestFramework (ERROR): https://github.com/Bitcoin-ABC/bitcoin-abc/issues
2021-07-05T14:44:02.935000Z TestFramework (ERROR):

Each failure log is accessible here:
Bitcoin ABC functional tests with the next upgrade activated: p2p_blockfilters.py

deadalnix requested changes to this revision.Jul 7 2021, 15:45

The general approach seems to make sense, but holly hell, the code is a total mess. You need to think about what these different function do, or should do, and ventilate accordingly, instead of just patching things in random places.

I won't hide that I'm upset. The code in there was clean, it has turned into a complete mess. It now does all kind of random crap that seem unrelated (broadcast all proofs? Does the mempool has a broadcast all transaction? It doesn't, which leaves me with the question: how did this code become even more nonsensical that the legacy satoshi codebase? Is that a joke? Am I being pranked?). there is simply no way for me - and I doubt anyone else - to convince themselve that this code is correct or even udnerstand what it does. In fact, it's not even clear anymore what anything in there actually does. The API that is exposed for tests has grow to 17 methods, vs 6 for the actual class API. How does this happen?

This only need to touch a couple of codepath:

  • When adding a node, if we don't have a peer, add it as pending node - or modify existing pending node status with the right proof.
  • When creating a new peer, syphon all the pending node that belong to that peer.
  • When removing a peer, put all of its nodes back in the pending node list.

So how come this touches code all over the place, checks for orphaning, and need to check for random crap all over the place? This patch should look like three green blocks + header + tests.

src/avalanche/peermanager.cpp
26 ↗(On Diff #29064)

Add a comment explaining the process here. I get it now, but it's far from obvious.

36 ↗(On Diff #29064)

Is there any instance where addOrUpdateNode would want to avoid running this operation? It doesn't looks like it to me.

60 ↗(On Diff #29064)

This is incomprehensible. I don't know what's happening there, or why this needs to be changed in this patch. It generally seems like there is a multiplication of function in there that don't really have a clear purpose and code is added randomly over time. Why do we have codepath that leave things dangling? how does the current code affect this? What are the invariants that hold? I don't think anyone knows anymore.

235 ↗(On Diff #29064)

The changes in there do not seem to belong in this patch.

323 ↗(On Diff #29064)

How does this make sense? The proof is attached to the peer, so how can it be an orphan?

src/avalanche/peermanager.h
107 ↗(On Diff #29064)

Fair enough :)

225 ↗(On Diff #29064)

Shouldn't this be a size_t ?

225 ↗(On Diff #29064)

Also, is it needed at all? For tests, you could simply get the vector of node ids and check that. Simply checking the count doesn't ensure that things are working properly anyways.

235 ↗(On Diff #29064)

I have so many questions.

This revision now requires changes to proceed.Jul 7 2021, 15:45

Address feedback:

  • Rebase over various refactors
  • There are now only 4 places with code change (the 4th one being the node removal)
  • The getNodeCount method is replaced by a friend test structure for better test accuracy

There is a change in behavior since the last version: when a peer is removed, all the nodes
are unconditionnaly moved to the pending set. In the previous version, it was only done if
the peer was removed due to an orphaned proof, which was unecessary complexity.

The pending node erasure is kept oustside of the addOrUpdateNode(). Erasing in the method
would make the code more complex because it needs to deal with the iterators being
invalidated.

deadalnix requested changes to this revision.Jul 17 2021, 09:45

A couple of questions, but this looks much better now.

src/avalanche/peermanager.cpp
33

Why isn't addOrUpdateNode doing this?

295

I thought there was an erase API that take two iterators and blast everything in between the two, no?

This revision now requires changes to proceed.Jul 17 2021, 09:45

Move the erasure to addOrUpdateNode. This makes the code easier to reason about but requires a double loop in the fetchOrUpdatePeer() method to deal with invalidated iterators.

This revision is now accepted and ready to land.Jul 19 2021, 10:26