Fix flakey feature_pruning
AbandonedPublic
Actions

Authored by markblundeberg on Feb 5 2020, 17:32.

Details

Reviewers

deadalnix
jasonbcox

Group Reviewers

Restricted Project

Summary

We get failures of this sort, very occasionally:
https://build.bitcoinabc.org/viewLog.html?buildId=23720&buildTypeId=BitcoinABCMasterLinux&tab=buildResultsDiv

Locally it has gotten very bad now that I'm making diffs
with sigops counting disabled, because that speeds up block
validation massively (these tests make billions of
OP_NOPs which get iterated one by one during sigop counting).

The failure is because when reorg_back starts, node 0 may still
be sending blocks to node 2 as a consequence of the previous test.
When node 2 calls invalidateblock during the reorg_back
function and then node 0 sends a block building off that invalid
chain, this earns a ban for node 0, and the test ends up timing
out later on when node 2 was supposed to have gotten a reorg chain
from node 0.

Test Plan

ninja all && ./test/functional/test_runner.py feature_pruning

Diff Detail

Repository

rABC Bitcoin ABC

Branch

feature_pruning_flake

Lint

Lint Passed

Unit

No Test Coverage

Build Status

Buildable 9278
Build 16497: Default Diff Build & Tests
Build 16496: arc lint + arc unit

Event Timeline

markblundeberg created this revision.Feb 5 2020, 17:32

Herald added a reviewer: Restricted Project. · View Herald TranscriptFeb 5 2020, 17:32

Build Bitcoin-ABC / Diffs / Diff Testing started.

(No joke about the speedup with sigops off... we're talking about logged ConnectBlock times going from 85 ms to 9 ms. And yes the test runs overall about 2x faster...)

markblundeberg added a child revision: D5029: deactivate sigops limits in phonon upgrade.Feb 5 2020, 17:35

markblundeberg added inline comments.

test/functional/feature_pruning.py
49

Build Bitcoin-ABC / Diffs / Diff Testing passed.

Harbormaster completed remote builds in B9278: Diff 16034.Feb 5 2020, 17:38

Good catch, this has been bugging me too.

test/functional/feature_pruning.py
235	The problem with that is that it is going through the RPC rather than the network, so you aren't really testing what you are supposed to anymore. You may want to add a wait for that polls for node 2's tip and see if it is where it needs to - and fail after a timeout.

This revision now requires changes to proceed.Feb 5 2020, 19:00

jasonbcox requested changes to this revision.Feb 5 2020, 19:07

jasonbcox added a subscriber: jasonbcox.

jasonbcox added inline comments.

test/functional/feature_pruning.py
229	Looks to me like the `sync_blocks` call actually belongs here because `reorg_test` should not leave the test in a bad state after completing (or nearly so). It also seems incorrect for `calc_usage` to be called if the nodes are not synced.

markblundeberg mentioned this in D5168: Merge #16538: test: Add missing sync_blocks to feature_pruning.Feb 6 2020, 01:35

Theres's a backport for this: D5168 . I could have sworn I looked for backports on this, but apparently I missed that.

test/functional/feature_pruning.py
229	Yep on both points. `calc_usage` there is checking node 2 which is the problematic one, so we definitely want it synced.
235	Hmm I don't get what you mean... `sync_blocks` is the standard way to do 'wait until all these nodes are on the same tip' and it's used throughout the test. What's wrong with RPC?

markblundeberg removed a child revision: D5029: deactivate sigops limits in phonon upgrade.Feb 6 2020, 01:37

deadalnix added inline comments.Feb 6 2020, 02:10

test/functional/feature_pruning.py
235	Nothing wrong with RPC, unless you ant to test that a node actually prune blocks when it recieve them as now block are not recieved via the same codepath.