Page MenuHomePhabricator

[avalanche] Drop stale votes after too many rounds without finalization
AbandonedPublic

Authored by sdulfari on May 16 2022, 20:30.

Details

Reviewers
Fabien
deadalnix
Group Reviewers
Restricted Project
Summary

A future patch will leverage this change so the node can take appropriate action if a vote
becomes stale. For example, the node may drop an item that has become stale.

Also fix a bug in addNodeToQuorum that incorrectly counts votes in unit tests. The new tests
utilize the corrected behavior.

Test Plan
ninja check

Run avalanche tests a few times since we added some randomness:

for I in {1..20}; do ninja check-avalanche ; done

Diff Detail

Event Timeline

Fabien requested changes to this revision.May 17 2022, 08:19
Fabien added a subscriber: Fabien.
Fabien added inline comments.
src/avalanche/voterecord.cpp
81 ↗(On Diff #33545)

Can this change be its own diff ?

src/avalanche/voterecord.h
22 ↗(On Diff #33545)

This should be unsigned

79 ↗(On Diff #33545)

Would that makes sense to check confidence < 1 as well ?

src/net_processing.cpp
5192 ↗(On Diff #33545)

This is exclusive from the other status, so stale is enough

5256 ↗(On Diff #33545)

This is contradictory with the above sentence: most peers will likely not know about the best proof either

5286 ↗(On Diff #33545)

This is probably worth a warning in the log at least

This revision now requires changes to proceed.May 17 2022, 08:19
src/avalanche/voterecord.cpp
81 ↗(On Diff #33545)

It can. I decided to leave it as part of this diff since it's both used and adds test coverage for it here. But since you raised this as a question, that rationale must not be strong enough to be obvious. I will split it apart.

src/avalanche/voterecord.h
79 ↗(On Diff #33545)

I considered that and decided against it since it doesn't add any real benefit. In the rare case this happens at random, being marked as stale is not the end of the world for this vote record. In the adversarial case, an attacker need only tweak the attack design.

src/avalanche/voterecord.cpp
81 ↗(On Diff #33545)

Actually, successfulVotes and addNodeToQuorum are both private members of VoteRecord. It's best to test behavior of the public interface instead, so I will keep this part of the diff as-is unless you are strongly opposed.

  • Added some missing test cases
  • Added requested log
  • Make threshold unsigned
  • Add more test coverage
  • Drop the stale vote in addition to reporting the VoteStatus
  • Always check shouldPoll() in getInvsForNextPoll()
  • Make the isStale() check more robust so that votes are only dropped when inflight polls have finished. Otherwise it's possible to ban peers that respond after the record has been dropped.
sdulfari retitled this revision from [avalanche] Mark vote records as stale after too many votes without finalization to [avalanche] Drop stale votes after too many rounds without finalization.May 18 2022, 04:43
Fabien requested changes to this revision.May 18 2022, 04:45
Fabien added inline comments.
src/avalanche/processor.cpp
597 ↗(On Diff #33567)

What does it solve ?

src/avalanche/voterecord.h
78 ↗(On Diff #33567)

This is possible the condition will never happen under normal circumstances.
Also I don't understand why it is necessary: removing the vote records doesn't clear the queries

This revision now requires changes to proceed.May 18 2022, 04:45
src/avalanche/processor.cpp
597 ↗(On Diff #33567)

Looking at VoteRecord's API, you would expect shouldPoll() would always be called, not that it's a special test-only function. Since I want that logic to be changed, it makes sense that this logic applies to all code paths. Otherwise, it's possible to pass unit tests when infact the entire thing is broken in production.

src/avalanche/voterecord.h
78 ↗(On Diff #33567)

You're right. I was confused about the request lifecycle. This inflight check is not necessary.

Remove unnecesarily complex logic from isStale(). The banning circumstance isn't possible since my understanding was incorrect.

Fabien requested changes to this revision.May 19 2022, 08:30
Fabien added inline comments.
src/avalanche/processor.cpp
597 ↗(On Diff #33594)

You simply don't need this at all.

What this method cares about is making sure you don't return more elements that the limit. Why does it need to care about the status of the vote ? If it's in the vote record map, then it's worth being polled. The check you introduce here is creating thread safety concern for the sake of maybe avoiding the item one round of vote, it's not worth it imo.

If anything you can do like the blocks and erase the vote records that are no longer needed via an IsWorthPolling method for example.

src/avalanche/test/processor_tests.cpp
455 ↗(On Diff #33594)

You decided that a stale vote is no longer worth polling. The current design is assuming that everything in the vote record map is worth polling.

src/avalanche/voterecord.h
79 ↗(On Diff #33594)

That's not guaranteed thread safe (despite the increment from addNoteToQuorum) is probably an atomic operation)

99 ↗(On Diff #33594)

This is mixing the concerns. Why should you not poll if the vote is stale ?

src/net_processing.cpp
5286 ↗(On Diff #33594)

Unexpected avalanche marked => avalanche marked ?

This revision now requires changes to proceed.May 19 2022, 08:30
deadalnix added inline comments.
src/avalanche/voterecord.cpp
63 ↗(On Diff #33594)

This doesn't seem like a good change. Don't count successful votes when no successful vote occured.

src/avalanche/voterecord.h
79 ↗(On Diff #33594)

Nothing in this class is thead safe, so idk why it is a poblem.

However, this method clearly doesn't make sense. The number of successful vote increasing clearly doesn't indicate staleness. If you exclude the possibility of inconclusive votes, this would in fact be indicating the opposite of stale.

deadalnix requested changes to this revision.May 19 2022, 17:20
deadalnix added inline comments.
src/avalanche/voterecord.h
79 ↗(On Diff #33594)

So, what indicates progress is the confidence level. There fore, what indicate staleness would be a confidence score too low for the number of successful vote registered - which would indicate endless flip flopping or very large number of inconclusive rounds.