Page MenuHomePhabricator

Delay the peer logic validation startup
AbandonedPublic

Authored by Fabien on Nov 25 2020, 20:18.

Details

Reviewers
deadalnix
Group Reviewers
Restricted Project
Summary

This fixes an issue found by TSAN:
https://build.bitcoinabc.org/viewLog.html?buildTypeId=BitcoinABC_ResourceIntensiveBuilds_BitcoinAbcMasterTsan&buildId=135843&tab=buildLog&logTab=tree&filter=debug&expand=all&_focus=1665&guest=1

What happened here is a race on the m_max_outbound_full_relay or
m_max_outbound_block_relay conman members. There is a circular
dependency betwen CConman and PeerLogicValidation, which is "solved"
by the use of an Init() function to pass the peer logic validation to
the conman. The racey members are initialized at this time.
But the peer logic validation starts a thread at construction time which
in turns needs access to these members. Under certain timing
circumstances, the conman initialization and the peer logic validation
can concurrently set/read these members, causing the race.

This behavior can only occur once at initialization time (either in the
main app or during tests) and only when certain timing conditions are
met, which happens under heavy load. This is why TSAN triggered on CI:
the unit tests running in parallel are very resources intensive,
especially since we have the radix insert stress test consuming 100% CPU
for quite some time and requiring over 16GB of memory. This is also the
reason why core won't notice this issue: they don't have this test.

This diff fixes the issue by delaying the peer validation logic thread.
The constructor is split so that a Start() method is extracted and can
be called after the conman initialization completed.

Test Plan

Rebase this patch on top of D8376.
With TSAN:

ninja check-bitcoin-radix_tests check-bitcoin-coinselector_tests

./contrib/teamcity/build-configurations.py build-tsan

Diff Detail

Event Timeline

Fabien requested review of this revision.Nov 25 2020, 20:19
deadalnix requested changes to this revision.Nov 26 2020, 18:50
deadalnix added a subscriber: deadalnix.

This is adding temporal dependency. Th scheduler thread is already manually started, so that temporal dependency is 100% redundant.

This revision now requires changes to proceed.Nov 26 2020, 18:50