Page MenuHomePhabricator

[avalanche] Add a way to check if a VoteRecord is stale
ClosedPublic

Authored by sdulfari on May 19 2022, 23:24.

Details

Reviewers
deadalnix
Group Reviewers
Restricted Project
Commits
rABC579bfd749b4d: [avalanche] Add a way to check if a VoteRecord is stale
Summary

A future patch will use this to handle the case where a vote hasn't finalized after a reasonable time.

The Avalanche whitepaper suggests that it's expected for finalization to occur after O(log(n))
rounds of voting. While our implementation differs in some ways, this expectation should hold
reasonably close. The threshold was selected to be an order of magnitude higher than the number
of votes required for finalization, which gives plenty of room for confidence to reset multiple
times before finalizing under normal network conditions.

This threshold can later be made configurable and/or tightened as needed.

Depends on D11501

Test Plan
ninja check-avalanche

Diff Detail

Event Timeline

Fabien added inline comments.
src/avalanche/test/voterecord_tests.cpp
146 ↗(On Diff #33635)

You only need 2 out of 8

180 ↗(On Diff #33635)

You should split out the first yes vote that increases the confidence (because 1 out of 8) from the others that are inconclusive

deadalnix requested changes to this revision.May 20 2022, 12:53
deadalnix added a subscriber: deadalnix.

So one successful round and w can stale forever? Seriously, I don't even know what to say, this is obviously not good.

This revision now requires changes to proceed.May 20 2022, 12:53
  • Fix approach so that records will go stale for all confidence levels
  • Add way more test coverage
  • Tighten threshold for low-confidence stalls

These thresholds seems REALLY low. How did you pick them?

These thresholds seems REALLY low. How did you pick them?

Avalanche whitepaper indicates an expectation of O(log(n)) rounds to finalization where n is number of nodes. Using bitnodes.io as a guide for potential future size of the eCash network, order of mag is 10^4 nodes, which puts us in the range of 4-5 rounds to finalization under normal circumstances. Avalanche notion of "rounds" is a bit different from our design, but I make the assumption that 128 votes being our finalization threshold, multiplying by a factor higher than the number of expected rounds should put us in a good place. 128 * 8 = 1024 for the threshold at the high confidence end where finalization is expected to occur.

A tighter threshold of 256 votes at the low confidence end was selected to reduce the impact of attempted DoS by polling the network for inventories that are not being distributed by anyone. This number was selected to be high enough that multiple confidence flip flops can occur under normal conditions and still finalize.

Given that the actual eCash network size is currently two orders of mag smaller than Bitcoin, these numbers should be more than sufficient for us to deploy and test on mainnet. Any learnings can be applied to adjusting these thresholds as needed.

I should also mention it's my expectation that confidence flips are not uniformly distributed, but more likely to occur earlier in the voting process. While it's possible to reach 127 confidence and reset twice, then hitting the low-end threshold, I believe this case to be rare.

I gave my last comment some more thought and came to this conclusion: Even if my assumptions are correct in practice,
it's better (from a user UX perspective) to deploy with thresholds a bit higher than expected. Consider these scenarios:

A. ABC releases with tight thresholds that require tweaking under short notice for some reason. Until node operators
upgrade, nodes are more likely to stall votes that are difficult to reconcile.

B. ABC releases with looser thresholds that require tweaking under short notice for some reason. Until node operators
upgrade, nodes are more likely to eventually finalize, at the risk of using unexpectedly higher resource consumption.

With these scenarios considered, B has the better roll-out since A requires a critical mass of upgraded nodes to work
correctly while B has a smoother upgrade-x-impact curve. With this in mind, I've doubled the thresholds from my prior
patch.

deadalnix requested changes to this revision.May 24 2022, 15:50

These number do not make sense to me.

src/avalanche/voterecord.h
22 ↗(On Diff #33681)

Reaching finalization, if everything goes perfect, is going to take at least 140 - 150 rounds. Add a couple of inconclusive rounds in there and you are not far from the thresold. This still doesn't seem high to me.

29 ↗(On Diff #33681)

Same here, you'll get a factor > 1 if everything goes perfect, maybe 2 with some inconclusive rounds. In addition you take a 2x every time you flip flop, and you'll flip flop more in cases where there are a lot of inconclusive rounds.

In fact, If a block takes 2s to propagate on the network, you'll need to poll with mostly inconclusive rounds for 2s, that's 200 rounds right there. If there is a race, that's 200 to start flip flopping.

This revision now requires changes to proceed.May 24 2022, 15:50
  • Adjust thresholds to better work for blocks with long propagation delays
  • Fix some hard coded values in the tests so thresholds can be adjusted easier
This revision is now accepted and ready to land.May 24 2022, 20:42