Details

Reviewers

PiRK
deadalnix

Group Reviewers

Restricted Project

Maniphest Tasks

Restricted Maniphest Task

Commits

rABCfb5cc4cd497d: [avalanche] Remember nodes pending a proof and bind them when available

Summary

This diff adds a structure for remembering the nodes waiting for a
proof, and bind them as soon as this proof is known (i.e. a peer with
this proof is created).

Ref T1634.

Test Plan

ninja all check-avalanche

Diff Detail

Repository

rABC Bitcoin ABC

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

Fabien created this revision.Jun 25 2021, 12:52

Herald added a reviewer: Restricted Project. · View Herald TranscriptJun 25 2021, 12:52

Harbormaster completed remote builds in B16058: Diff 28982.Jun 25 2021, 13:07

Fix a typo

Fabien published this revision for review.Jun 26 2021, 12:08

Harbormaster completed remote builds in B16064: Diff 28991.Jun 26 2021, 12:17

PiRK accepted this revision.Jun 26 2021, 14:26

This revision is now accepted and ready to land.Jun 26 2021, 14:26

You have a fundamental problem here. You can register a node for 2 proofs and 2 keys. In addition, the key seems duplicated between the PeerManager and the node state. Is it used at all in the peer manager? If not, then get rid of it. In any case, you need to resolve this duplication problem.

It seems like you'd need unit tests for this.

This revision now requires changes to proceed.Jun 28 2021, 11:28

Address feedback, rebase on top of D9720:

The pubkey is no longer duplicated
Use a multi_index for the pending nodes enforce the node id uniqueness
Split out the network layer binding/activation flag so the feature is tested via unit tests only in this diff

Fabien edited the summary of this revision. (Show Details)Jul 5 2021, 14:37

Fabien edited the test plan for this revision. (Show Details)

Fabien added parent revisions: D9720: [avalanche] Don't check the delegation in the peer manager, D9738: [avalanche] Improve readability of the multi index container definitions.

Fabien removed a parent revision: D9700: [avalanche] Don't require the proof when adding an avalanche node.

Fabien added a child revision: D9739: [avalanche] Activate automated peer discovery.Jul 5 2021, 14:42

Harbormaster completed remote builds in B16109: Diff 29064.Jul 5 2021, 14:46

Build Bitcoin ABC Diffs / Diff Testing (build-diff) failed.

Failed tests logs:

====== Bitcoin ABC functional tests with the next upgrade activated: p2p_blockfilters.py ======

------- Stdout: -------
2021-07-05T14:43:54.641000Z TestFramework (INFO): Initializing test directory /work/abc-ci-builds/build-diff/test/tmp/test_runner_₿₵_  _20210705_144159/p2p_blockfilters_290
2021-07-05T14:43:59.769000Z TestFramework (INFO): get cfcheckpt on chain to be re-orged out.
2021-07-05T14:43:59.820000Z TestFramework (INFO): Reorg node 0 to a new chain.
2021-07-05T14:44:02.278000Z TestFramework (INFO): Check that peers can fetch cfcheckpt on active chain.
2021-07-05T14:44:02.331000Z TestFramework (ERROR): Assertion failed
Traceback (most recent call last):
  File "/work/test/functional/test_framework/test_framework.py", line 127, in main
    self.run_test()
  File "/work/test/functional/p2p_blockfilters.py", line 115, in run_test
    assert_equal(response.stop_hash, request.stop_hash)
  File "/work/test/functional/test_framework/util.py", line 60, in assert_equal
    for arg in (thing1, thing2) + args)))
AssertionError: not(20976074930306478592554250300701327872062515716428437594998550489965476602495 == 53636499989420933765690773610164576424255569450400882896522682575725428720027)
2021-07-05T14:44:02.381000Z TestFramework (INFO): Stopping nodes
2021-07-05T14:44:02.934000Z TestFramework (WARNING): Not cleaning up dir /work/abc-ci-builds/build-diff/test/tmp/test_runner_₿₵_  _20210705_144159/p2p_blockfilters_290
2021-07-05T14:44:02.934000Z TestFramework (ERROR): Test failed. Test logging available at /work/abc-ci-builds/build-diff/test/tmp/test_runner_₿₵_  _20210705_144159/p2p_blockfilters_290/test_framework.log
2021-07-05T14:44:02.934000Z TestFramework (ERROR): 
2021-07-05T14:44:02.934000Z TestFramework (ERROR): Hint: Call /work/test/functional/combine_logs.py '/work/abc-ci-builds/build-diff/test/tmp/test_runner_₿₵_  _20210705_144159/p2p_blockfilters_290' to consolidate all logs
2021-07-05T14:44:02.935000Z TestFramework (ERROR): 
2021-07-05T14:44:02.935000Z TestFramework (ERROR): If this failure happened unexpectedly or intermittently, please file a bug and provide a link or upload of the combined log.
2021-07-05T14:44:02.935000Z TestFramework (ERROR): https://github.com/Bitcoin-ABC/bitcoin-abc/issues
2021-07-05T14:44:02.935000Z TestFramework (ERROR):

Each failure log is accessible here:
Bitcoin ABC functional tests with the next upgrade activated: p2p_blockfilters.py

The general approach seems to make sense, but holly hell, the code is a total mess. You need to think about what these different function do, or should do, and ventilate accordingly, instead of just patching things in random places.

I won't hide that I'm upset. The code in there was clean, it has turned into a complete mess. It now does all kind of random crap that seem unrelated (broadcast all proofs? Does the mempool has a broadcast all transaction? It doesn't, which leaves me with the question: how did this code become even more nonsensical that the legacy satoshi codebase? Is that a joke? Am I being pranked?). there is simply no way for me - and I doubt anyone else - to convince themselve that this code is correct or even udnerstand what it does. In fact, it's not even clear anymore what anything in there actually does. The API that is exposed for tests has grow to 17 methods, vs 6 for the actual class API. How does this happen?

This only need to touch a couple of codepath:

When adding a node, if we don't have a peer, add it as pending node - or modify existing pending node status with the right proof.
When creating a new peer, syphon all the pending node that belong to that peer.
When removing a peer, put all of its nodes back in the pending node list.

So how come this touches code all over the place, checks for orphaning, and need to check for random crap all over the place? This patch should look like three green blocks + header + tests.

src/avalanche/peermanager.cpp
26 ↗	(On Diff #29064)	Add a comment explaining the process here. I get it now, but it's far from obvious.
36 ↗	(On Diff #29064)	Is there any instance where `addOrUpdateNode` would want to avoid running this operation? It doesn't looks like it to me.
60 ↗	(On Diff #29064)	This is incomprehensible. I don't know what's happening there, or why this needs to be changed in this patch. It generally seems like there is a multiplication of function in there that don't really have a clear purpose and code is added randomly over time. Why do we have codepath that leave things dangling? how does the current code affect this? What are the invariants that hold? I don't think anyone knows anymore.
235 ↗	(On Diff #29064)	The changes in there do not seem to belong in this patch.
323 ↗	(On Diff #29064)	How does this make sense? The proof is attached to the peer, so how can it be an orphan?
src/avalanche/peermanager.h
107 ↗	(On Diff #29064)	Fair enough :)
225 ↗	(On Diff #29064)	Shouldn't this be a size_t ?
225 ↗	(On Diff #29064)	Also, is it needed at all? For tests, you could simply get the vector of node ids and check that. Simply checking the count doesn't ensure that things are working properly anyways.
235 ↗	(On Diff #29064)	I have so many questions.

This revision now requires changes to proceed.Jul 7 2021, 15:45

Fabien mentioned this in D9755: [avalanche] Minor simplification in the fetchOrCreatePeer method.Jul 8 2021, 11:57

Fabien mentioned this in rABC49317e5958f8: [avalanche] Minor simplification in the fetchOrCreatePeer method.Jul 8 2021, 12:53

Fabien edited the summary of this revision. (Show Details)Jul 13 2021, 09:42

Address feedback:

Rebase over various refactors
There are now only 4 places with code change (the 4th one being the node removal)
The getNodeCount method is replaced by a friend test structure for better test accuracy

There is a change in behavior since the last version: when a peer is removed, all the nodes
are unconditionnaly moved to the pending set. In the previous version, it was only done if
the peer was removed due to an orphaned proof, which was unecessary complexity.

The pending node erasure is kept oustside of the addOrUpdateNode(). Erasing in the method
would make the code more complex because it needs to deal with the iterators being
invalidated.

Harbormaster completed remote builds in B16167: Diff 29156.Jul 13 2021, 10:12

A couple of questions, but this looks much better now.

src/avalanche/peermanager.cpp
33 ↗	(On Diff #29156)	Why isn't addOrUpdateNode doing this?
295 ↗	(On Diff #29156)	I thought there was an erase API that take two iterators and blast everything in between the two, no?

This revision now requires changes to proceed.Jul 17 2021, 09:45

Move the erasure to addOrUpdateNode. This makes the code easier to reason about but requires a double loop in the fetchOrUpdatePeer() method to deal with invalidated iterators.

deadalnix accepted this revision.Jul 19 2021, 10:26

This revision is now accepted and ready to land.Jul 19 2021, 10:26

Harbormaster completed remote builds in B16208: Diff 29231.Jul 19 2021, 10:54

Closed by commit rABCfb5cc4cd497d: [avalanche] Remember nodes pending a proof and bind them when available (authored by Fabien). · Explain WhyJul 19 2021, 11:03

This revision was automatically updated to reflect the committed changes.

Fabien added a commit: rABCfb5cc4cd497d: [avalanche] Remember nodes pending a proof and bind them when available.