Page MenuHomePhabricator

[CI] Do not wait too long for build logs to flush
ClosedPublic

Authored by jasonbcox on Oct 1 2020, 20:00.

Details

Reviewers
Fabien
Group Reviewers
Restricted Project
Commits
rABCc1ff352e013c: [CI] Do not wait too long for build logs to flush
Summary

Waiting too long when the process has finished only stalls the build agent
for an unnecessarily long period of time. If the logs are not available a few
seconds after the process completes, it should warn and continue.

Background:

We've started to see an increase in the number of CI builds that
appear to timeout and yet many of them are completed or nearly so.

In a common failure case such as this one, note how the build supposedly times out
and then the log flushes all at once: https://build.bitcoinabc.org/viewLog.html?buildId=77854&buildTypeId=BitcoinABC_BitcoinAbcStaging&tab=buildLog&_focus=1940
Yet scrolling to the bottom, you'll find the tests passed at the end.

The root cause of this is still unknown, although there are some hints that it
may be due to one or a combination of the following:

  • Python version (CI is currently on 3.7, and lots of changes to asyncio were made from 3.6 - 3.8, so it's hard to know if this is an issue that was fixed or not).
  • Python stdout buffering behavior. I'm running an A/B test to see if setting PYTHONUNBUFFERED modifies the behavior of the log flushing, but this will take time to evaluate. The docs indicate that this is supposed to be enabled by default in 3.7, but a quick reading of the Python source code indicates this may not be true.
  • Docker hanging. For example, https://github.com/moby/moby/issues/41052 was resolved recently, but all of our build agents should have this fix, so I think this is less likely.

I have been unsuccessful in reproducing this locally, so the next best strategy
is for us to side-step the issue until we either reproduce or it gets fixed from
regular upgrades to our infra. This issue only affects a minority of builds,
but causes a disproportionate negative impact on devs as one failure out of
many builds on a diff causes a lot of noise during review.

Test Plan

Run some builds using build-configurations.py. They should log to stdout in
realtime like usual.
Whether a build passes or fails, echo $? should still output the correct exit code.
Run a lot of builds on this diff and examine the logs.

Event Timeline

jasonbcox planned changes to this revision.
This revision is now accepted and ready to land.Oct 1 2020, 20:36

Snippet of first build failure:

[348/409] bitcoin: testing settings_tests
[349/409] Running utility command for check-bitcoin-torcontrol_tests
[350/409] Running utility command for check-bitcoin-settings_tests
[351/409] bitcoin: testing streams_tests
[352/409] bitcoin: testing timedata_tests
[353/409] Running utility command for check-bitcoin-streams_tests
[354/409] Running utility command for check-bitcoin-timedata_tests
[355/409] bitcoin: testing uint256_tests
[356/409] Running utility command for check-bitcoin-uint256_tests
[357/409] bitcoin: testing checkdatasig_tests
[358/409] bitcoin: testing walletdb_tests
[359/409] Running utility command for check-bitcoin-checkdatasig_tests
[360/409] Running utility command for check-bitcoin-walletdb_tests
[361/409] bitcoin: testing serialize_tests
[362/409] bitcoin: testing undo_tests
[363/409] bitcoin: testing sigencoding_tests
[364/409] Running utility command for check-bitcoin-serialize_tests
[365/409] Running utility command for check-bitcoin-undo_tests
[366/409] Running utility command for check-bitcoin-sigencoding_tests
[367/409] bitcoin: testing util_threadnames_tests
[368/409] bitcoin: testing radix_tests
[369/409] Running utility command for check-bitcoin-util_threadnames_tests
[370/409] Running utility command for check-bitcoin-radix_tests
[371/409] bitcoin: testing blockcheck_tests
[372/409] bitcoin: testing script_standard_tests
[373/409] Running utility command for check-bitcoin-blockcheck_tests
[374/409] bitcoin: testing crypto_tests
[375/409] Running utility command for check-bitcoin-script_standard_tests
[376/409] Running utility command for check-bitcoin-crypto_tests
[377/409] bitcoin: testing blockstatus_tests
[378/409] Running utility command for check-bitcoin-blockstatus_tests
[379/409] bitcoin: testing ismine_tests
[380/409] Running utility command for check-bitcoin-ismine_tests
[381/409] bitcoin: testing cashaddr_tests
[382/409] bitcoin: testing versionbits_tests
[383/409] bitcoin: testing getarg_tests
[384/409] Running utility command for check-bitcoin-cashaddr_tests
[385/409] Running utility command for check-bitcoin-versionbits_tests
[386/409] bitcoin: testing validation_tests
[387/409] Running utility command for check-bitcoin-getarg_tests
[388/409] Running utility command for check-bitcoin-validation_tests
[389/409] bitcoin: testing script_tests
[390/409] bitcoin: testing bswap_tests
[391/409] Running utility command for check-bitcoin-script_tests
[392/409] Running utility command for check-bitcoin-bswap_tests
[393/409] bitcoin: testing skiplist_tests
[394/409] Running utility command for check-bitcoin-skiplist_tests
[395/409] bitcoin: testing util_tests
[396/409] bitcoin: testing validation_block_tests
[397/409] Running utility command for check-bitcoin-util_tests
[398/409] Running utility command for check-bitcoin-validation_block_tests
[399/409] bitcoin: testing op_reversebytes_tests
[400/409] Running utility command for check-bitcoin-op_reversebytes_tests
[401/409] bitcoin: testing cuckoocache_tests
[402/409] Running utility command for check-bitcoin-cuckoocache_tests
[403/409] bitcoin: testing coins_tests
[404/409] Running utility command for check-bitcoin-coins_tests
[405/409] bitcoin: testing transaction_tests
[406/409] Running utility command for check-bitcoin-transaction_tests
Build build-clang timed out after 1200.0s
This revision is now accepted and ready to land.Oct 15 2020, 16:46