HomePhabricator

Fix a potential data race between CConman and the PeerLogicValidation

Description

Fix a potential data race between CConman and the PeerLogicValidation

Summary:
This issue was found by TSAN:
https://build.bitcoinabc.org/viewLog.html?buildTypeId=BitcoinABC_ResourceIntensiveBuilds_BitcoinAbcMasterTsan&buildId=135843&tab=buildLog&logTab=tree&filter=debug&expand=all&_focus=1665&guest=1

What happened here is a race on the m_max_outbound_full_relay or
m_max_outbound_block_relay conman members. There is a cirular
dependency betwen CConman and PeerLogicValidation, which is "solved"
by the use of an Init() function to pass the peer logic validation to
the conman. The racey members are initialized at this time.
But the peer logic validation starts a thread at construction time which
in turns needs access to these members. Under certain timing
circumstances, the conman initialization and the peer logic validation
can concurrently set/read these members, causing the race.

This behavior can only occur once at initialization time (either in the
main app or during tests) and only when certain timing conditions are
met, which happens under heavy load. This is why TSAN triggered on CI:
the unit tests running in parallel are very resources intensive,
especially since we have the radix insert stress test consuming 100% CPU
for quite some time and requiring over 16GB of memory. This is also the
reason why core won't notice this issue: they don't have this test.

Since the peer validation logic locks cs_main, we can do the same during
the racey init part. Since this is only called once at startup it will
not cause any performance issue.

Test Plan:
Rebase this patch on top of D8376.
With TSAN:

ninja check-bitcoin-radix_tests check-bitcoin-coinselector_tests

./contrib/teamcity/build-configurations.py build-tsan

Reviewers: #bitcoin_abc, deadalnix

Reviewed By: #bitcoin_abc, deadalnix

Subscribers: deadalnix

Differential Revision: https://reviews.bitcoinabc.org/D8525

Details

Provenance
FabienAuthored on Nov 25 2020, 13:58
FabienPushed on Dec 9 2020, 13:42
Reviewer
Restricted Project
Differential Revision
D8525: Fix a potential data race between CConman and the PeerLogicValidation
Parents
rABC207a1f435e9e: [seeder] Use netmagic from chainparams instead of a cached global
Branches
Unknown
Tags
Unknown