Page MenuHomePhabricator

Implement banning logic for Seeder for nodes on the wrong chain
AbandonedPublic

Authored by nakihito on Apr 8 2020, 19:33.

Details

Reviewers
jasonbcox
deadalnix
Group Reviewers
Restricted Owners Package(Owns No Changed Paths)
Restricted Project
Summary

The seeder currently only bans misbehaving nodes. With D5380, this
change will allow the seeder to eventually ban a node on the wrong
chain.

As previously noted in D5380, the seeder cannot differentiate between
an IBD node/a node stuck behind the checkpoint a node on the wrong chain.
Only a node clearly on the wrong chain should be banned.
Given enough time, a good node should finish its IBD and pass
CSeederNode.Run(). On the other hand, a node on the wrong chain very likely
won't change become good later on.

Nodes worthy of being banned must have the following properties:

(1) Have failed the current run of CSeederNode.Run()
(2) Be following the wrong chain
(3) Have poor reliability
(4) Have not recently (within the last two days) passed CSeederNode.Run()

(1) and (2) are determined by the communications from CSeederNode.Run().

(3) is determined by IsReliable(), the results of repeated iteration of a
reliability function. This prevents banning nodes that may only be following
part of the chain, but are at least useful as seeds for part of the chain.
Core nodes with high reliability, for example, are not following the correct
chain, but are still useful for IBD of a fresh node.

Excluding the edge case of previously untested node, (4) is determined
using ourLastSuccess. In order to solve the edge case, the seeder's
database must additionally track the time a node was first discovered
otherwise all fresh nodes would be banned on their first CSeederNode.Run() failure.

Depends on D5380 and D5800

Review note: The seeder is currently unable to reach the unbanning logic
while it is running. In order for a node to be unbanned, the seeder
must be restarted with the --wipeban option enabled. As such banned
nodes are effectively banned forever.

Test Plan
ninja check-bitcoin-seeder

checkpoint refers to the chain's most recent checkpoint found in chainparams.cpp.
Edit chainparams.cpp such that the only seed for TestNet is localhost.
Also edit in seeder/main.cpp the line after running TestNode()

addr.clear();

This will prevent the seeder from adding new addresses making it easier to see results.

ninja
mkdir testdir
./bitcoind --datadir=testdir --testnet

Allow the node to finish IBDing

./bitcoin-cli --testnet --datadir=testdir stop
rm testdir/testnet3/peers.dat

./bitcoind -whitelist=127.0.0.1 --datadir=testdir -listen=1 -connect=0 -debug=net -datadir=testdir
./bitcoin-seeder -host=localhost -ns=blah.bitframe.org -port=8888 -mbox=info@bitframe.org --testnet --wipeban --wipeignore

Output should eventually become:

0/1 available (0 tried in 2s, 0 new, 1 active), 1 banned; 0 DNS requests, 0 db queries

Stop the seeder.

./bitcoin-cli --testnet --datadir=testdir invalidateblock <checkpoint->next->next hash>
./bitcoin-seeder -host=localhost -ns=blah.bitframe.org -port=8888 -mbox=info@bitframe.org --testnet --wipeban --wipeignore

Output should eventually become:

0/1 available (0 tried in 2s, 0 new, 1 active), 1 banned; 0 DNS requests, 0 db queries

Stop the seeder.

./bitcoin-cli --testnet --datadir=testdir invalidateblock <checkpoint->next hash>
./bitcoin-seeder -host=localhost -ns=blah.bitframe.org -port=8888 -mbox=info@bitframe.org --testnet --wipeban --wipeignore

Output should eventually become:

0/0 available (0 tried in 2s, 0 new, 0 active), 2 banned; 0 DNS requests, 0 db queries

Stop the seeder.

./bitcoin-cli --testnet --datadir=testdir invalidateblock <checkpoint->prev hash>
./bitcoin-seeder -host=localhost -ns=blah.bitframe.org -port=8888 -mbox=info@bitframe.org --testnet --wipeban --wipeignore

Output should eventually become:

0/0 available (0 tried in 2s, 0 new, 0 active), 2 banned; 0 DNS requests, 0 db queries

Stop the seeder.

./bitcoin-cli --testnet --datadir=testdir reconsiderblock <checkpoint->next hash>
./bitcoin-seeder -host=localhost -ns=blah.bitframe.org -port=8888 -mbox=info@bitframe.org --testnet --wipeban --wipeignore

Output should eventually become:

0/1 available (0 tried in 2s, 0 new, 1 active), 1 banned; 0 DNS requests, 0 db queries

Stop the seeder.

Note: To speed up testing, reduce MIN_RETRY in seeder/db.h to 45 and line 201 in seeder/main.cpp to timeSinceLastSuccess > 300) {.

Undue the changes any speed up changes made.

mkdir ibd
./bitcoind --testnet -whitelist=127.0.0.0 -debug=net -datadir=ibd
./bitcoin-seeder -host=localhost -ns=blah.bitframe.org -port=8888 -mbox=info@bitframe.org --testnet --wipeban --wipeignore

Output should start as

0/2 available (0 tried in 0s, 0 new, 2 active), 0 banned; 0 DNS requests, 0 db queries

And eventually end as

0/1 available (0 tried in 2s, 0 new, 1 active), 1 banned; 0 DNS requests, 0 db queries

after IBD finishes assuming your machine can complete IBD within two days.

Diff Detail

Repository
rABC Bitcoin ABC
Branch
SeederBans
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 10314
Build 18449: Default Diff Build & Tests
Build 18448: arc lint + arc unit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
src/seeder/main.cpp
200 ↗(On Diff #18701)

Change to timeSinceLastSuccess > 300) { for testing.

jasonbcox requested changes to this revision.Apr 8 2020, 20:39
jasonbcox added a subscriber: jasonbcox.
jasonbcox added inline comments.
src/seeder/db.h
271 ↗(On Diff #18701)

This isn't a good design. What is "total tries" for a service result? Why is it relevant to the caller? (see later comments for more)

src/seeder/main.cpp
194 ↗(On Diff #18701)

Should the result ever be good if number of tries is 0? Is that even possible?

199 ↗(On Diff #18701)

If we only want to know that the node has been checked successfully before, the service result should plainly provide that information.

Also, this doesn't check for some level of "maturity" (probably a threshold on number of tries) of a bad node, so my understanding is nodes in IBD will be banned as soon as they fail to reply to the first peer test. The test plan does not appear to cover the mid-IBD case.

200 ↗(On Diff #18701)

Not for this patch, but this is a strong indication that this should be modifiable via command line args. This will make writing functional tests much easier.

Fixed text plan to include IBD, renamed some variables, and added timeFirstFound.

jasonbcox requested changes to this revision.Apr 11 2020, 00:50
jasonbcox added inline comments.
src/seeder/db.h
100 ↗(On Diff #18715)

This name indicates information about its caller, which it shouldn't know in the first place. This class doesn't know what "TestNode" is and that name is liable to change anyway.

In the context of this specific class, hasBeenTriedBefore() makes more sense IMO. If this comment is confusing, reread my older comments carefully.

101 ↗(On Diff #18715)

now is never set by the caller, so why make it a parameter?

102 ↗(On Diff #18715)

This was very confusing to read. Order of operations...

279 ↗(On Diff #18715)

why expose these here? Can't fReliable be inclusive of this information?

src/seeder/main.cpp
200 ↗(On Diff #18715)

This introduces ban logic for the first time in this function. Why? Ban logic already exists in TestNode() and further down that call stack. Isn't it more appropriate somewhere in there?

Pushed banning logic into TestNode().

Folded information into fReliable to limit the amount of CAddrInfo that is exposed.

Separated the TestNode() refactor out to its own diff (D5800).

nakihito edited the summary of this revision. (Show Details)
nakihito added a parent revision: D5800: Remove TestNode().
nakihito edited the summary of this revision. (Show Details)
jasonbcox requested changes to this revision.Apr 23 2020, 00:37
jasonbcox added inline comments.
src/seeder/db.cpp
74 ↗(On Diff #19042)

Multiple issues:

  • Why is the DB holding this logic? Like you did for timeSinceFoundOrLastSuccess() and hasBeenTriedBefore(), the logic should be close to the class that owns this logic. Databases typically do not do business logic by themselves.
  • Won't this be false for just about every good node since the last success will always be recent?
  • Also related to the previous points, this logic needs some unit tests. Coincidentally, it may be easier to test it with the logic moved to where it belongs.
src/seeder/db.h
275 ↗(On Diff #19042)

I don't think I'll block this specific diff with this since these namings affect the entire seeder, but the naming here is bad:

  • What is the difference between Good and Reliable?
  • Why are they separate?

There's no obvious answer to these questions without becoming familiar with the implementation.

  • fGood appears to only be used as "last run was good" and
  • fReliable appears to be "the runs since some time ago were good to some threshold"
src/seeder/main.cpp
209 ↗(On Diff #19042)

Ditto on ALL of the above comment's points. To clarify:

  • Why is this ban logic spread across 3 different places? Notice the node.GetBan() above already gets this value from somewhere.
  • Putting this logic here makes unit testing a nightmare.
This revision now requires changes to proceed.Apr 23 2020, 00:37

Passed more information to CSeederNode. Moved banning logic to DetermineBan().

jasonbcox requested changes to this revision.Apr 24 2020, 17:00

We discussed unit tests. Where are they?

Another tip: since this approach is way easier to unit test, it just so happens that breaking this diff into smaller digestible chunks is very easy. Consider doing so.

src/seeder/db.cpp
72 ↗(On Diff #19075)

Now that fReliable's intent has been nailed down a bit, might I suggest a better name for it? isGood? Call it what it is. I know this conflicts with fGood, but that's because fGood is inconsistent with the rest of the codebase and hence it's a bad name.

This revision now requires changes to proceed.Apr 24 2020, 17:00
nakihito added inline comments.
src/seeder/db.cpp
72 ↗(On Diff #19075)

Actually, I think fReliable is the correct name for this variable and IsGood() should be renamed to match (see D5830). I do agree that fGood is not very good, though.

Properly squashed changes.

deadalnix requested changes to this revision.Apr 25 2020, 21:21
deadalnix added a subscriber: deadalnix.

The description is completely incomprehensible. The problem statement is defined as solving a problem in another patch that has not landed (because it has a problem).

Going through the code, it seems to be modifying the database in various ways for reasons that are completely unclear to me. In fact, the title nor the description mention the database anywhere.

Finally, the whole logic seems to be time based, which in itself is a smell.

src/seeder/bitcoin.h
55 ↗(On Diff #19122)

If it is a time since event X, then it needs to be continuously updated, which I don't really think you want to be doing.

This revision now requires changes to proceed.Apr 25 2020, 21:21
nakihito retitled this revision from Implement banning logic for Seeder to Implement banning logic for Seeder for nodes on the wrong chain.Apr 27 2020, 19:43
nakihito edited the summary of this revision. (Show Details)
nakihito edited the test plan for this revision. (Show Details)

This was far too complicated and completely unnecessary.