Page MenuHomePhabricator

fix abc-parkedchain timeouts
ClosedPublic

Authored by markblundeberg on Feb 14 2019, 04:14.

Details

Summary

Lately on the Diff Testing we have seen random failures in this test.
By looking at logs we can see the following race condition was happening:

  1. Node0 generates some new blocks.
  2. Node1 requests & receives headers.
  3. Node0 runs invalidateblock on some blocks.
  4. Node1 requests the blocks corresponding to the full header list.
  5. Node0 ignores requests of invalid blocks, per BlockRequestAllowed.
  6. Node0 re-validates those blocks and continues on.
  7. Test proceeds up to wait_for_tip(). However, Node1 is still waiting for blocks that will never come. Timeout after 60 seconds.

This should break the race by requiring node1 to be synced before any rewind commands are done.

(thanks to @florian for noticing this in the logs)

Test Plan

very hard to test this rare failure; I can't seem to reproduce it locally.

Diff Detail

Repository
rABC Bitcoin ABC
Branch
stop_timeout
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 4984
Build 8031: Bitcoin ABC Buildbot (legacy)
Build 8030: arc lint + arc unit

Event Timeline

Example: https://build.bitcoinabc.org/viewLog.html?buildId=11443&buildTypeId=BitcoinABC_BitcoinAbcStaging&tab=artifacts

Check out the debug logs.

  1. node0 generates 3d4e9cb5afdebb672a5c3e7fd21b1a6b04fec955d8bc43e0ea04d6d4d5950dc1 at height 230 -- note the test starts with both nodes at height 200.
  2. node1 requests headers; it gets 3d4e9cb5afdebb672a5c3e7fd21b1a6b04fec955d8bc43e0ea04d6d4d5950dc1 in header list
  3. node0 invalidates 3d4e9cb5afdebb672a5c3e7fd21b1a6b04fec955d8bc43e0ea04d6d4d5950dc1
  4. node1 requests 3d4e9cb5afdebb672a5c3e7fd21b1a6b04fec955d8bc43e0ea04d6d4d5950dc1
  5. node0 log entry: ignoring request from peer=0 for old block that isn't in the main chain
  6. node0 re-validates 3d4e9cb5afdebb672a5c3e7fd21b1a6b04fec955d8bc43e0ea04d6d4d5950dc1
  7. ... test appears to continue until the test times out because we are waiting for node1 to reach the tip of a chain that includes 3d4e9cb5afdebb672a5c3e7fd21b1a6b04fec955d8bc43e0ea04d6d4d5950dc1

(block 1b703817f5809bf806aa67bd23c1a950d5c6be2c4df7fbc04f454f7fc1964329 is also never received, so there are two 'old block' messages)

In this case the race timing seems to be exacerbated by the fact that node1 spins up later than node0, so it has to get 30 blocks all at once. This increases the duration between the time when node1 requests headers to when it requests the block.

jasonbcox requested changes to this revision.Feb 14 2019, 04:59
jasonbcox added a subscriber: jasonbcox.
jasonbcox added inline comments.
test/functional/abc-parkedchain.py
149

Similar to this line, you should just use wait_for_tip instead of sync_blocks. I almost did something similar in the past: https://reviews.bitcoinabc.org/D2189?id=6335#inline-12547

Just move the definition of wait_for_tip to the top of the file.

This revision now requires changes to proceed.Feb 14 2019, 04:59

use wait_for_tip instead of sync_blocks

This revision is now accepted and ready to land.Feb 14 2019, 05:34