Details

Reviewers

Group Reviewers

Restricted Project

Commits

rABC579bfd749b4d: [avalanche] Add a way to check if a VoteRecord is stale

Summary

A future patch will use this to handle the case where a vote hasn't finalized after a reasonable time.

The Avalanche whitepaper suggests that it's expected for finalization to occur after O(log(n))
rounds of voting. While our implementation differs in some ways, this expectation should hold
reasonably close. The threshold was selected to be an order of magnitude higher than the number
of votes required for finalization, which gives plenty of room for confidence to reset multiple
times before finalizing under normal network conditions.

This threshold can later be made configurable and/or tightened as needed.

Depends on D11501

Test Plan

ninja check-avalanche

Diff Detail

Repository

rABC Bitcoin ABC

Branch

inconclusive-vote-count

Lint

Lint Passed

Unit

No Test Coverage

Build Status

Buildable 19114
Build 37976: Build Diff	build-without-wallet · lint-circular-dependencies · build-diff · build-debug · build-clang-tidy · build-clang
Build 37975: arc lint + arc unit

Event Timeline

sdulfari created this revision.May 19 2022, 23:24

Herald added a reviewer: Restricted Project. · View Herald TranscriptMay 19 2022, 23:24

sdulfari requested review of this revision.May 19 2022, 23:24

Harbormaster completed remote builds in B19114: Diff 33635.May 19 2022, 23:39

sdulfari added a child revision: D11503: [avalanche] Set voterecord_tests node ID using a fixture.May 19 2022, 23:40

Fabien added a subscriber: Fabien.May 20 2022, 09:52

Fabien added inline comments.

src/avalanche/test/voterecord_tests.cpp
146	You only need 2 out of 8
180	You should split out the first yes vote that increases the confidence (because 1 out of 8) from the others that are inconclusive

So one successful round and w can stale forever? Seriously, I don't even know what to say, this is obviously not good.

This revision now requires changes to proceed.May 20 2022, 12:53

Fix approach so that records will go stale for all confidence levels
Add way more test coverage
Tighten threshold for low-confidence stalls

Harbormaster completed remote builds in B19128: Diff 33651.May 21 2022, 05:35

These thresholds seems REALLY low. How did you pick them?

In D11502#257747, @deadalnix wrote:

These thresholds seems REALLY low. How did you pick them?

Avalanche whitepaper indicates an expectation of O(log(n)) rounds to finalization where n is number of nodes. Using bitnodes.io as a guide for potential future size of the eCash network, order of mag is 10^4 nodes, which puts us in the range of 4-5 rounds to finalization under normal circumstances. Avalanche notion of "rounds" is a bit different from our design, but I make the assumption that 128 votes being our finalization threshold, multiplying by a factor higher than the number of expected rounds should put us in a good place. 128 * 8 = 1024 for the threshold at the high confidence end where finalization is expected to occur.

A tighter threshold of 256 votes at the low confidence end was selected to reduce the impact of attempted DoS by polling the network for inventories that are not being distributed by anyone. This number was selected to be high enough that multiple confidence flip flops can occur under normal conditions and still finalize.

Given that the actual eCash network size is currently two orders of mag smaller than Bitcoin, these numbers should be more than sufficient for us to deploy and test on mainnet. Any learnings can be applied to adjusting these thresholds as needed.

I should also mention it's my expectation that confidence flips are not uniformly distributed, but more likely to occur earlier in the voting process. While it's possible to reach 127 confidence and reset twice, then hitting the low-end threshold, I believe this case to be rare.

I gave my last comment some more thought and came to this conclusion: Even if my assumptions are correct in practice,
it's better (from a user UX perspective) to deploy with thresholds a bit higher than expected. Consider these scenarios:

A. ABC releases with tight thresholds that require tweaking under short notice for some reason. Until node operators
upgrade, nodes are more likely to stall votes that are difficult to reconcile.

B. ABC releases with looser thresholds that require tweaking under short notice for some reason. Until node operators
upgrade, nodes are more likely to eventually finalize, at the risk of using unexpectedly higher resource consumption.

With these scenarios considered, B has the better roll-out since A requires a critical mass of upgraded nodes to work
correctly while B has a smoother upgrade-x-impact curve. With this in mind, I've doubled the thresholds from my prior
patch.

Harbormaster completed remote builds in B19145: Diff 33681.May 23 2022, 21:52

These number do not make sense to me.

src/avalanche/voterecord.h
22 ↗	(On Diff #33681)	Reaching finalization, if everything goes perfect, is going to take at least 140 - 150 rounds. Add a couple of inconclusive rounds in there and you are not far from the thresold. This still doesn't seem high to me.
29 ↗	(On Diff #33681)	Same here, you'll get a factor > 1 if everything goes perfect, maybe 2 with some inconclusive rounds. In addition you take a 2x every time you flip flop, and you'll flip flop more in cases where there are a lot of inconclusive rounds. In fact, If a block takes 2s to propagate on the network, you'll need to poll with mostly inconclusive rounds for 2s, that's 200 rounds right there. If there is a race, that's 200 to start flip flopping.