Page MenuHomePhabricator

[avalanche] add nodes when we have received their proof
AbandonedPublic

Authored by PiRK on Jun 10 2021, 15:23.

Details

Reviewers
Fabien
deadalnix
Group Reviewers
Restricted Project
Maniphest Tasks
Restricted Maniphest Task
Summary

Add a map of nodes waiting for their proof. When receiving a valid avahello message, a node is added to that container. Every 2 to 3 minutes we check if a valid proof (=avalanche peer) has been registered for this node, and add the node when possible.

The size of the container is limited by the maximum number of connected peers. A node is removed from the container when it is disconnected.

Depends on D9686

Test Plan
cmake -GNinja ..   -DCMAKE_BUILD_TYPE=Debug   -DENABLE_SANITIZERS=thread
ninja all check-all

Event Timeline

PiRK added a task: Restricted Maniphest Task.Jun 10 2021, 15:30
Fabien added inline comments.
src/net_processing.cpp
1114 ↗(On Diff #28849)

Nit: const

1125 ↗(On Diff #28849)

Nit: capitalize

handle node disconnection: remove all the corresponding nodeid entries from the map

TODO: we probably don't want to allow for multiple proofids with the same nodeid, as the PeerManager only permits one peer per node.
But we may want to allow multiple nodes waiting for the same proof.

Swap key <-> value in the map, to enforce a unique nodeid and allow for a proof to be associated with multiple nodes.

If a node sends multiple avahello messages, the last received proofid is considered and the previous ones are ignored.

TODO: test not working yet

make the test work using an unreasonable timeout

Failed tests logs:

====== Bitcoin ABC functional tests: abc_p2p_avalanche.py ======

------- Stdout: -------
2021-06-11T18:53:47.386000Z TestFramework (INFO): Initializing test directory /work/abc-ci-builds/build-debug/test/tmp/test_runner_₿₵_  _20210611_185212/abc_p2p_avalanche_429
2021-06-11T18:53:50.875000Z TestFramework (INFO): Poll for the chain tip...
2021-06-11T18:53:50.929000Z TestFramework (INFO): Poll for a selection of blocks...
2021-06-11T18:53:50.987000Z TestFramework (INFO): Poll for a selection of blocks, but some are now invalid...
2021-06-11T18:53:51.119000Z TestFramework (INFO): Poll for unknown blocks...
2021-06-11T18:53:51.174000Z TestFramework (INFO): Trigger polling from the node...
2021-06-11T18:53:51.190000Z TestFramework (INFO): Testing getavalanchepeerinfo...
2021-06-11T18:53:51.283000Z TestFramework (INFO): Answer all polls to finalize...
2021-06-11T18:53:51.556000Z TestFramework (INFO): Answer all polls to park...
2021-06-11T18:53:53.853000Z TestFramework (INFO): Check the node is discouraging unexpected avaresponses.
2021-06-11T18:53:53.911000Z TestFramework (INFO): Check the node is signalling the avalanche service bit only if there is a proof.
2021-06-11T18:53:54.619000Z TestFramework (INFO): Test the avahello signature (node -> P2PInterface)
2021-06-11T18:53:54.730000Z TestFramework (INFO): Test that wrong avahello signature causes a ban
2021-06-11T18:53:54.940000Z TestFramework (INFO): Check that receiving a valid avahello triggers a proof getdata request
2021-06-11T18:53:57.001000Z TestFramework (INFO): Check that the interface gets added as a avalanche node
2021-06-11T18:56:50.418000Z TestFramework (INFO): Check that we can download the proof from our peer
2021-06-11T18:56:50.425000Z TestFramework (INFO): Proof has been inv'ed recently, check it can be requested
2021-06-11T18:56:50.827000Z TestFramework (ERROR): Assertion failed
Traceback (most recent call last):
  File "/work/test/functional/test_framework/test_framework.py", line 126, in main
    self.run_test()
  File "/work/test/functional/abc_p2p_avalanche.py", line 472, in run_test
    "-avamasterkey=cND2ZvtabDbJ1gucx9GWH6XT9kgTAqfb6cotPt5Q5CyxVDhid2EN",
  File "/work/test/functional/test_framework/test_framework.py", line 508, in restart_node
    self.stop_node(i)
  File "/work/test/functional/test_framework/test_framework.py", line 494, in stop_node
    self.nodes[i].stop_node(expected_stderr, wait=wait)
  File "/work/test/functional/test_framework/test_node.py", line 434, in stop_node
    self.wait_until_stopped()
  File "/work/test/functional/test_framework/test_node.py", line 461, in wait_until_stopped
    timeout_factor=self.timeout_factor)
  File "/work/test/functional/test_framework/util.py", line 264, in wait_until
    if predicate():
  File "/work/test/functional/test_framework/test_node.py", line 449, in is_node_stopped
    "Node returned non-zero exit code ({}) when stopping".format(return_code))
AssertionError: [node 0] Node returned non-zero exit code (-6) when stopping
2021-06-11T18:56:50.878000Z TestFramework (INFO): Stopping nodes
------- Stderr: -------
Traceback (most recent call last):
  File "/work/test/functional/abc_p2p_avalanche.py", line 495, in <module>
    AvalancheTest().main()
  File "/work/test/functional/test_framework/test_framework.py", line 146, in main
    exit_code = self.shutdown()
  File "/work/test/functional/test_framework/test_framework.py", line 272, in shutdown
    self.stop_nodes()
  File "/work/test/functional/test_framework/test_framework.py", line 500, in stop_nodes
    node.stop_node(wait=wait, wait_until_stopped=False)
  File "/work/test/functional/test_framework/test_node.py", line 413, in stop_node
    self.stop(wait=wait)
  File "/work/test/functional/test_framework/coverage.py", line 48, in __call__
    return_val = self.auth_service_proxy_instance.__call__(*args, **kwargs)
  File "/work/test/functional/test_framework/authproxy.py", line 157, in __call__
    'POST', self.__url.path, postdata.encode('utf-8'))
  File "/work/test/functional/test_framework/authproxy.py", line 116, in _request
    self.__conn.request(method, path, postdata, headers)
  File "/usr/lib/python3.7/http/client.py", line 1260, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.7/http/client.py", line 1306, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.7/http/client.py", line 1255, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.7/http/client.py", line 1030, in _send_output
    self.send(msg)
  File "/usr/lib/python3.7/http/client.py", line 970, in send
    self.connect()
  File "/usr/lib/python3.7/http/client.py", line 942, in connect
    (self.host,self.port), self.timeout, self.source_address)
  File "/usr/lib/python3.7/socket.py", line 727, in create_connection
    raise err
  File "/usr/lib/python3.7/socket.py", line 716, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

Each failure log is accessible here:
Bitcoin ABC functional tests: abc_p2p_avalanche.py

PiRK edited the summary of this revision. (Show Details)

use the mockscheduler RPC to trigger the node-proof pairing

PiRK published this revision for review.Jun 11 2021, 19:43

Failed tests logs:

====== Bitcoin ABC functional tests: abc_p2p_avalanche.py ======

------- Stdout: -------
2021-06-11T19:46:09.670000Z TestFramework (INFO): Initializing test directory /work/abc-ci-builds/build-debug/test/tmp/test_runner_₿₵_  _20210611_194426/abc_p2p_avalanche_388
2021-06-11T19:46:13.172000Z TestFramework (INFO): Poll for the chain tip...
2021-06-11T19:46:13.225000Z TestFramework (INFO): Poll for a selection of blocks...
2021-06-11T19:46:13.283000Z TestFramework (INFO): Poll for a selection of blocks, but some are now invalid...
2021-06-11T19:46:13.409000Z TestFramework (INFO): Poll for unknown blocks...
2021-06-11T19:46:13.465000Z TestFramework (INFO): Trigger polling from the node...
2021-06-11T19:46:13.480000Z TestFramework (INFO): Testing getavalanchepeerinfo...
2021-06-11T19:46:13.583000Z TestFramework (INFO): Answer all polls to finalize...
2021-06-11T19:46:13.844000Z TestFramework (INFO): Answer all polls to park...
2021-06-11T19:46:16.070000Z TestFramework (INFO): Check the node is discouraging unexpected avaresponses.
2021-06-11T19:46:16.129000Z TestFramework (INFO): Check the node is signalling the avalanche service bit only if there is a proof.
2021-06-11T19:46:16.937000Z TestFramework (INFO): Test the avahello signature (node -> P2PInterface)
2021-06-11T19:46:17.047000Z TestFramework (INFO): Test that wrong avahello signature causes a ban
2021-06-11T19:46:17.257000Z TestFramework (INFO): Check that receiving a valid avahello triggers a proof getdata request
2021-06-11T19:46:19.321000Z TestFramework (INFO): Check that the interface gets added as a avalanche node
2021-06-11T19:46:19.375000Z TestFramework (INFO): Check that we can download the proof from our peer
2021-06-11T19:46:19.431000Z TestFramework (INFO): Proof has been inv'ed recently, check it can be requested
2021-06-11T19:46:19.933000Z TestFramework (ERROR): Assertion failed
Traceback (most recent call last):
  File "/work/test/functional/test_framework/test_framework.py", line 126, in main
    self.run_test()
  File "/work/test/functional/abc_p2p_avalanche.py", line 473, in run_test
    "-avamasterkey=cND2ZvtabDbJ1gucx9GWH6XT9kgTAqfb6cotPt5Q5CyxVDhid2EN",
  File "/work/test/functional/test_framework/test_framework.py", line 508, in restart_node
    self.stop_node(i)
  File "/work/test/functional/test_framework/test_framework.py", line 494, in stop_node
    self.nodes[i].stop_node(expected_stderr, wait=wait)
  File "/work/test/functional/test_framework/test_node.py", line 434, in stop_node
    self.wait_until_stopped()
  File "/work/test/functional/test_framework/test_node.py", line 461, in wait_until_stopped
    timeout_factor=self.timeout_factor)
  File "/work/test/functional/test_framework/util.py", line 264, in wait_until
    if predicate():
  File "/work/test/functional/test_framework/test_node.py", line 449, in is_node_stopped
    "Node returned non-zero exit code ({}) when stopping".format(return_code))
AssertionError: [node 0] Node returned non-zero exit code (-6) when stopping
2021-06-11T19:46:19.985000Z TestFramework (INFO): Stopping nodes
------- Stderr: -------
Traceback (most recent call last):
  File "/work/test/functional/abc_p2p_avalanche.py", line 496, in <module>
    AvalancheTest().main()
  File "/work/test/functional/test_framework/test_framework.py", line 146, in main
    exit_code = self.shutdown()
  File "/work/test/functional/test_framework/test_framework.py", line 272, in shutdown
    self.stop_nodes()
  File "/work/test/functional/test_framework/test_framework.py", line 500, in stop_nodes
    node.stop_node(wait=wait, wait_until_stopped=False)
  File "/work/test/functional/test_framework/test_node.py", line 413, in stop_node
    self.stop(wait=wait)
  File "/work/test/functional/test_framework/coverage.py", line 48, in __call__
    return_val = self.auth_service_proxy_instance.__call__(*args, **kwargs)
  File "/work/test/functional/test_framework/authproxy.py", line 157, in __call__
    'POST', self.__url.path, postdata.encode('utf-8'))
  File "/work/test/functional/test_framework/authproxy.py", line 116, in _request
    self.__conn.request(method, path, postdata, headers)
  File "/usr/lib/python3.7/http/client.py", line 1260, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.7/http/client.py", line 1306, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.7/http/client.py", line 1255, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.7/http/client.py", line 1030, in _send_output
    self.send(msg)
  File "/usr/lib/python3.7/http/client.py", line 970, in send
    self.connect()
  File "/usr/lib/python3.7/http/client.py", line 942, in connect
    (self.host,self.port), self.timeout, self.source_address)
  File "/usr/lib/python3.7/socket.py", line 727, in create_connection
    raise err
  File "/usr/lib/python3.7/socket.py", line 716, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

Each failure log is accessible here:
Bitcoin ABC functional tests: abc_p2p_avalanche.py

PiRK planned changes to this revision.Jun 11 2021, 20:00

I will investigate the test failure tomorrow. For now I can't reproduce it at home.

Work around a lock order inversion

PiRK retitled this revision from [avalanche] periodically check if a peer can be added as an avalanche node to [avalanche] add nodes when we have received their proof.Jun 13 2021, 20:07
PiRK edited the summary of this revision. (Show Details)
Fabien requested changes to this revision.Jun 14 2021, 06:56
Fabien added inline comments.
src/net_processing.cpp
1122 ↗(On Diff #28892)

AFAIK this is not protected by any lock, so you might get a race condition here.

This revision now requires changes to proceed.Jun 14 2021, 06:56
PiRK edited the summary of this revision. (Show Details)

rebase on D9686 and lock cs_avalanche_state before reading the peer's delegation

Refactor the new code to use a single temporary container instead of 2.
Now the nodes are added to the temp container when matching new proofs are found, then removed if failing to be added as an avalanche peer, and the remaining ones are then removed from the delegation map.

rebase onto master after splitting abc_p2p_avalanche.py, move the new test code to abc_p2p_avalanche_peer_discovery.py

Fabien requested changes to this revision.Jun 17 2021, 07:30
Fabien added inline comments.
src/net_processing.cpp
1118

const ref

1163

If you start with an empty container of the same type this can be simplified and avoid copies. See how the orphan proof proceeds for rescan as an example

4235

This is not updated if the new delegation is invalid

This revision now requires changes to proceed.Jun 17 2021, 07:30

Address feedback.
Optimize PairNodesWithProofs.
Refactor the AvaHello message handling code so that the delegation and the proof id stored in the map are always in sync. If the delegation is invalid, we don't even need to create the AvalancheState and store the delegation.

This whole approach do not seem very sensible to me.

src/net_processing.cpp
408 ↗(On Diff #28945)

You don't really need a map here, because the proof id is in the avalanche state in the node.

408 ↗(On Diff #28945)

Note: unless you reverse the map, see other comment.

1150 ↗(On Diff #28945)

This whole approach doesn't seem sensible to me. There is a point in time where you know some change is possible on this front: when a new proof is received. Why reprocess all the node on a regualar basis? If you reverse the mapping to fetch by proof id, you can even directly find the concerned nodes.

Instead this constantly mull over all the nodes for not good reason at all.

src/net_processing.cpp
1150 ↗(On Diff #28945)

I started this with reversing the key and value, but I found that it implied the possibility of having multiple proofs for a node but not multiple nodes for a proof. That's why I flipped it this way.

I thought of the possibility of having a map of <ProofId, std::vector<NodeId>>, but I chose <NodeId, ProofId> because it naturally solved the NodeId unicity problem.

I wondered if it maybe should be solved upstream of this code, by not allowing a node to send another AvaHello message after its AvalancheState is already set. That way, we woulkdn't risk registering multiple ProofIDs for a single NodeId.

deadalnix requested changes to this revision.Jun 17 2021, 23:03
deadalnix added inline comments.
src/net_processing.cpp
1150 ↗(On Diff #28945)

Even then, this ends up having to pool over the whole datastructure constantly, so this is clearly wrong.

This revision now requires changes to proceed.Jun 17 2021, 23:03

We will use a different approach, with node-proof pairing done in the peermanager.