The choosen values for the number of threads and elements in the test
cause TSAN to peak out at 100% CPU all cores (on a 30 cores machine !)
and over 16GB of memory.
This diff makes the parameters configurable and lower the values when
running with TSAN. This will also speed up the TSAN build significantly,
as the test actually takes over 12 minutes to complete on CI. The peak
memory usage is lowered to about 2.8GB on my machine.
Details
- Reviewers
deadalnix - Group Reviewers
Restricted Project
./contrib/teamcity/build-configurations.py build-tsan
Diff Detail
- Repository
- rABC Bitcoin ABC
- Branch
- fix_tsan_radix_tests
- Lint
Lint Passed - Unit
No Test Coverage - Build Status
Event Timeline
Snippet of first build failure:
2020-07-01T13:53:43.447000Z TestFramework (INFO): Initializing test directory /work/abc-ci-builds/build-tsan/test/tmp/test_runner_₿₵_ _20200701_135011/wallet_txn_clone_164 2020-07-01T13:53:46.927000Z TestFramework (INFO): Stopping nodes 2020-07-01T13:53:47.333000Z TestFramework (INFO): Cleaning up /work/abc-ci-builds/build-tsan/test/tmp/test_runner_₿₵_ _20200701_135011/wallet_txn_clone_164 on exit 2020-07-01T13:53:47.333000Z TestFramework (INFO): Tests successful [13:54:19]i: [Bitcoin ABC functional tests] wallet_txn_clone.py --mineblock [13:54:19]i: [wallet_txn_clone.py --mineblock] [Test Output] 2020-07-01T13:53:13.180000Z TestFramework (INFO): Initializing test directory /work/abc-ci-builds/build-tsan/test/tmp/test_runner_₿₵_ _20200701_135011/wallet_txn_clone_143 2020-07-01T13:53:18.026000Z TestFramework (INFO): Stopping nodes 2020-07-01T13:53:18.381000Z TestFramework (INFO): Cleaning up /work/abc-ci-builds/build-tsan/test/tmp/test_runner_₿₵_ _20200701_135011/wallet_txn_clone_143 on exit 2020-07-01T13:53:18.381000Z TestFramework (INFO): Tests successful [13:54:19]i: [Bitcoin ABC functional tests] wallet_txn_doublespend.py [13:54:19]i: [wallet_txn_doublespend.py] [Test Output] 2020-07-01T13:54:05.458000Z TestFramework (INFO): Initializing test directory /work/abc-ci-builds/build-tsan/test/tmp/test_runner_₿₵_ _20200701_135011/wallet_txn_doublespend_194 2020-07-01T13:54:07.973000Z TestFramework (INFO): Stopping nodes 2020-07-01T13:54:08.378000Z TestFramework (INFO): Cleaning up /work/abc-ci-builds/build-tsan/test/tmp/test_runner_₿₵_ _20200701_135011/wallet_txn_doublespend_194 on exit 2020-07-01T13:54:08.378000Z TestFramework (INFO): Tests successful [13:54:19]i: [Bitcoin ABC functional tests] wallet_txn_doublespend.py --mineblock [13:54:19]i: [wallet_txn_doublespend.py --mineblock] [Test Output] 2020-07-01T13:53:15.432000Z TestFramework (INFO): Initializing test directory /work/abc-ci-builds/build-tsan/test/tmp/test_runner_₿₵_ _20200701_135011/wallet_txn_doublespend_144 2020-07-01T13:53:20.083000Z TestFramework (INFO): Stopping nodes 2020-07-01T13:53:20.338000Z TestFramework (INFO): Cleaning up /work/abc-ci-builds/build-tsan/test/tmp/test_runner_₿₵_ _20200701_135011/wallet_txn_doublespend_144 on exit 2020-07-01T13:53:20.338000Z TestFramework (INFO): Tests successful [13:54:19]i: [Bitcoin ABC functional tests] wallet_zapwallettxes.py [13:54:19]i: [wallet_zapwallettxes.py] [Test Output] 2020-07-01T13:52:51.003000Z TestFramework (INFO): Initializing test directory /work/abc-ci-builds/build-tsan/test/tmp/test_runner_₿₵_ _20200701_135011/wallet_zapwallettxes_132 2020-07-01T13:52:52.355000Z TestFramework (INFO): Mining blocks... 2020-07-01T13:53:00.587000Z TestFramework (INFO): Stopping nodes 2020-07-01T13:53:00.940000Z TestFramework (INFO): Cleaning up /work/abc-ci-builds/build-tsan/test/tmp/test_runner_₿₵_ _20200701_135011/wallet_zapwallettxes_132 on exit 2020-07-01T13:53:00.940000Z TestFramework (INFO): Tests successful [13:54:19]i: [Bitcoin ABC functional tests] feature_maxuploadtarget.py [13:54:19]i: [feature_maxuploadtarget.py] [Test Output] 2020-07-01T13:50:48.343000Z TestFramework (INFO): Initializing test directory /work/abc-ci-builds/build-tsan/test/tmp/test_runner_₿₵_ _20200701_135011/feature_maxuploadtarget_95 2020-07-01T13:50:59.648000Z TestFramework (INFO): Peer 0 disconnected after downloading old block too many times 2020-07-01T13:51:10.088000Z TestFramework (INFO): Peer 1 able to repeatedly download new block 2020-07-01T13:51:10.194000Z TestFramework (INFO): Peer 1 disconnected after trying to download old block 2020-07-01T13:51:10.194000Z TestFramework (INFO): Advancing system time on node to clear counters... 2020-07-01T13:51:10.299000Z TestFramework (INFO): Peer 2 able to download old block 2020-07-01T13:51:10.299000Z TestFramework (INFO): Restarting nodes with -whitelist=127.0.0.1 2020-07-01T13:51:15.685000Z TestFramework (INFO): Peer still connected after trying to download old block (whitelisted) 2020-07-01T13:51:15.736000Z TestFramework (INFO): Stopping nodes [13:54:19]i: [feature_maxuploadtarget.py] [Test Error Output] Traceback (most recent call last): File "/work/test/functional/feature_maxuploadtarget.py", line 177, in <module> MaxUploadTest().main() File "/work/test/functional/test_framework/test_framework.py", line 210, in main self.stop_nodes() File "/work/test/functional/test_framework/test_framework.py", line 407, in stop_nodes node.stop_node(wait=wait) File "/work/test/functional/test_framework/test_node.py", line 325, in stop_node "Unexpected stderr {} != {}".format(stderr, expected_stderr)) AssertionError: Unexpected stderr WARNING: too long mutex cycle found != [13:54:42] : [Creating archive artifacts.tar.gz] Archive was created, file size 225.19 MB (236136099 bytes) [13:54:42] : [Publishing artifacts] Publishing 1 file using [WebPublisher]: results/artifacts => artifacts.tar.gz [13:54:42] : [Publishing artifacts] Publishing 1 file using [ArtifactsCachePublisher]: results/artifacts => artifacts.tar.gz [13:54:18] : [Step 1/1] ninja: build stopped: subcommand failed. [13:54:18]i: [Step 1/1] ##teamcity[publishArtifacts '+:results/artifacts=>artifacts.tar.gz'] [13:54:42] : [Step 1/1] [13:54:42] : [Step 1/1] Build build-tsan failed with exit code 1 [13:54:42]W: [Step 1/1] Process exited with code 1 [13:54:51]E: [Step 1/1] Process exited with code 1 (Step: Command Line)
Each failure log is accessible here:
Bitcoin ABC functional tests: feature_maxuploadtarget.py
src/test/main.cpp | ||
---|---|---|
31 ↗ | (On Diff #21923) | That way is by using a static constructor. |
Snippet of first build failure:
2020-07-01T14:03:21.722000Z TestFramework (INFO): Initializing test directory /work/abc-ci-builds/build-tsan/test/tmp/test_runner_₿₵_ _20200701_135929/wallet_txn_clone_645 2020-07-01T14:03:25.720000Z TestFramework (INFO): Stopping nodes 2020-07-01T14:03:26.086000Z TestFramework (INFO): Cleaning up /work/abc-ci-builds/build-tsan/test/tmp/test_runner_₿₵_ _20200701_135929/wallet_txn_clone_645 on exit 2020-07-01T14:03:26.086000Z TestFramework (INFO): Tests successful [14:04:00]i: [Bitcoin ABC functional tests] wallet_txn_clone.py --mineblock [14:04:00]i: [wallet_txn_clone.py --mineblock] [Test Output] 2020-07-01T14:02:48.452000Z TestFramework (INFO): Initializing test directory /work/abc-ci-builds/build-tsan/test/tmp/test_runner_₿₵_ _20200701_135929/wallet_txn_clone_624 2020-07-01T14:02:53.347000Z TestFramework (INFO): Stopping nodes 2020-07-01T14:02:53.652000Z TestFramework (INFO): Cleaning up /work/abc-ci-builds/build-tsan/test/tmp/test_runner_₿₵_ _20200701_135929/wallet_txn_clone_624 on exit 2020-07-01T14:02:53.652000Z TestFramework (INFO): Tests successful [14:04:00]i: [Bitcoin ABC functional tests] wallet_txn_doublespend.py [14:04:00]i: [wallet_txn_doublespend.py] [Test Output] 2020-07-01T14:03:45.386000Z TestFramework (INFO): Initializing test directory /work/abc-ci-builds/build-tsan/test/tmp/test_runner_₿₵_ _20200701_135929/wallet_txn_doublespend_675 2020-07-01T14:03:47.847000Z TestFramework (INFO): Stopping nodes 2020-07-01T14:03:48.201000Z TestFramework (INFO): Cleaning up /work/abc-ci-builds/build-tsan/test/tmp/test_runner_₿₵_ _20200701_135929/wallet_txn_doublespend_675 on exit 2020-07-01T14:03:48.202000Z TestFramework (INFO): Tests successful [14:04:00]i: [Bitcoin ABC functional tests] wallet_txn_doublespend.py --mineblock [14:04:00]i: [wallet_txn_doublespend.py --mineblock] [Test Output] 2020-07-01T14:02:50.760000Z TestFramework (INFO): Initializing test directory /work/abc-ci-builds/build-tsan/test/tmp/test_runner_₿₵_ _20200701_135929/wallet_txn_doublespend_625 2020-07-01T14:02:55.777000Z TestFramework (INFO): Stopping nodes 2020-07-01T14:02:56.682000Z TestFramework (INFO): Cleaning up /work/abc-ci-builds/build-tsan/test/tmp/test_runner_₿₵_ _20200701_135929/wallet_txn_doublespend_625 on exit 2020-07-01T14:02:56.682000Z TestFramework (INFO): Tests successful [14:04:00]i: [Bitcoin ABC functional tests] wallet_zapwallettxes.py [14:04:00]i: [wallet_zapwallettxes.py] [Test Output] 2020-07-01T14:02:22.689000Z TestFramework (INFO): Initializing test directory /work/abc-ci-builds/build-tsan/test/tmp/test_runner_₿₵_ _20200701_135929/wallet_zapwallettxes_613 2020-07-01T14:02:24.277000Z TestFramework (INFO): Mining blocks... 2020-07-01T14:02:32.665000Z TestFramework (INFO): Stopping nodes 2020-07-01T14:02:32.868000Z TestFramework (INFO): Cleaning up /work/abc-ci-builds/build-tsan/test/tmp/test_runner_₿₵_ _20200701_135929/wallet_zapwallettxes_613 on exit 2020-07-01T14:02:32.868000Z TestFramework (INFO): Tests successful [14:04:00]i: [Bitcoin ABC functional tests] feature_maxuploadtarget.py [14:04:00]i: [feature_maxuploadtarget.py] [Test Output] 2020-07-01T14:00:12.221000Z TestFramework (INFO): Initializing test directory /work/abc-ci-builds/build-tsan/test/tmp/test_runner_₿₵_ _20200701_135929/feature_maxuploadtarget_576 2020-07-01T14:00:23.422000Z TestFramework (INFO): Peer 0 disconnected after downloading old block too many times 2020-07-01T14:00:33.982000Z TestFramework (INFO): Peer 1 able to repeatedly download new block 2020-07-01T14:00:34.088000Z TestFramework (INFO): Peer 1 disconnected after trying to download old block 2020-07-01T14:00:34.088000Z TestFramework (INFO): Advancing system time on node to clear counters... 2020-07-01T14:00:34.249000Z TestFramework (INFO): Peer 2 able to download old block 2020-07-01T14:00:34.249000Z TestFramework (INFO): Restarting nodes with -whitelist=127.0.0.1 2020-07-01T14:00:40.178000Z TestFramework (INFO): Peer still connected after trying to download old block (whitelisted) 2020-07-01T14:00:40.230000Z TestFramework (INFO): Stopping nodes [14:04:00]i: [feature_maxuploadtarget.py] [Test Error Output] Traceback (most recent call last): File "/work/test/functional/feature_maxuploadtarget.py", line 177, in <module> MaxUploadTest().main() File "/work/test/functional/test_framework/test_framework.py", line 210, in main self.stop_nodes() File "/work/test/functional/test_framework/test_framework.py", line 407, in stop_nodes node.stop_node(wait=wait) File "/work/test/functional/test_framework/test_node.py", line 325, in stop_node "Unexpected stderr {} != {}".format(stderr, expected_stderr)) AssertionError: Unexpected stderr WARNING: too long mutex cycle found != [14:04:24] : [Creating archive artifacts.tar.gz] Archive was created, file size 225.19 MB (236137506 bytes) [14:04:24] : [Publishing artifacts] Publishing 1 file using [ArtifactsCachePublisher]: results/artifacts => artifacts.tar.gz [14:04:24] : [Publishing artifacts] Publishing 1 file using [WebPublisher]: results/artifacts => artifacts.tar.gz [14:03:58] : [Step 1/1] ninja: build stopped: subcommand failed. [14:03:58]i: [Step 1/1] ##teamcity[publishArtifacts '+:results/artifacts=>artifacts.tar.gz'] [14:04:24] : [Step 1/1] [14:04:24] : [Step 1/1] Build build-tsan failed with exit code 1 [14:04:24]W: [Step 1/1] Process exited with code 1 [14:04:35]E: [Step 1/1] Process exited with code 1 (Step: Command Line)
Each failure log is accessible here:
Bitcoin ABC functional tests: feature_maxuploadtarget.py
Snippet of first build failure:
2020-07-01T14:11:39.224000Z TestFramework (INFO): Tests successful [14:12:42]i: [Bitcoin ABC functional tests] wallet_txn_doublespend.py [14:12:42]i: [wallet_txn_doublespend.py] [Test Output] 2020-07-01T14:12:26.524000Z TestFramework (INFO): Initializing test directory /work/abc-ci-builds/build-tsan/test/tmp/test_runner_₿₵_ _20200701_140819/wallet_txn_doublespend_658 2020-07-01T14:12:30.389000Z TestFramework (INFO): Stopping nodes 2020-07-01T14:12:30.745000Z TestFramework (INFO): Cleaning up /work/abc-ci-builds/build-tsan/test/tmp/test_runner_₿₵_ _20200701_140819/wallet_txn_doublespend_658 on exit 2020-07-01T14:12:30.745000Z TestFramework (INFO): Tests successful [14:12:42]i: [Bitcoin ABC functional tests] wallet_txn_doublespend.py --mineblock [14:12:42]i: [wallet_txn_doublespend.py --mineblock] [Test Output] 2020-07-01T14:11:35.722000Z TestFramework (INFO): Initializing test directory /work/abc-ci-builds/build-tsan/test/tmp/test_runner_₿₵_ _20200701_140819/wallet_txn_doublespend_608 2020-07-01T14:11:40.623000Z TestFramework (INFO): Stopping nodes 2020-07-01T14:11:40.927000Z TestFramework (INFO): Cleaning up /work/abc-ci-builds/build-tsan/test/tmp/test_runner_₿₵_ _20200701_140819/wallet_txn_doublespend_608 on exit 2020-07-01T14:11:40.927000Z TestFramework (INFO): Tests successful [14:12:42]i: [Bitcoin ABC functional tests] wallet_zapwallettxes.py [14:12:42]i: [wallet_zapwallettxes.py] [Test Output] 2020-07-01T14:11:08.077000Z TestFramework (INFO): Initializing test directory /work/abc-ci-builds/build-tsan/test/tmp/test_runner_₿₵_ _20200701_140819/wallet_zapwallettxes_596 2020-07-01T14:11:09.432000Z TestFramework (INFO): Mining blocks... 2020-07-01T14:11:17.696000Z TestFramework (INFO): Stopping nodes 2020-07-01T14:11:17.899000Z TestFramework (INFO): Cleaning up /work/abc-ci-builds/build-tsan/test/tmp/test_runner_₿₵_ _20200701_140819/wallet_zapwallettxes_596 on exit 2020-07-01T14:11:17.899000Z TestFramework (INFO): Tests successful [14:12:42]i: [Bitcoin ABC functional tests] feature_maxuploadtarget.py [14:12:42]i: [feature_maxuploadtarget.py] [Test Output] 2020-07-01T14:08:54.515000Z TestFramework (INFO): Initializing test directory /work/abc-ci-builds/build-tsan/test/tmp/test_runner_₿₵_ _20200701_140819/feature_maxuploadtarget_559 2020-07-01T14:09:06.160000Z TestFramework (INFO): Peer 0 disconnected after downloading old block too many times 2020-07-01T14:09:16.298000Z TestFramework (INFO): Peer 1 able to repeatedly download new block 2020-07-01T14:09:16.404000Z TestFramework (INFO): Peer 1 disconnected after trying to download old block 2020-07-01T14:09:16.405000Z TestFramework (INFO): Advancing system time on node to clear counters... 2020-07-01T14:09:16.572000Z TestFramework (INFO): Peer 2 able to download old block 2020-07-01T14:09:16.572000Z TestFramework (INFO): Restarting nodes with -whitelist=127.0.0.1 2020-07-01T14:09:22.161000Z TestFramework (INFO): Peer still connected after trying to download old block (whitelisted) 2020-07-01T14:09:22.213000Z TestFramework (INFO): Stopping nodes [14:12:42]i: [feature_maxuploadtarget.py] [Test Error Output] Traceback (most recent call last): File "/work/test/functional/feature_maxuploadtarget.py", line 177, in <module> MaxUploadTest().main() File "/work/test/functional/test_framework/test_framework.py", line 210, in main self.stop_nodes() File "/work/test/functional/test_framework/test_framework.py", line 407, in stop_nodes node.stop_node(wait=wait) File "/work/test/functional/test_framework/test_node.py", line 325, in stop_node "Unexpected stderr {} != {}".format(stderr, expected_stderr)) AssertionError: Unexpected stderr WARNING: too long mutex cycle found WARNING: too long mutex cycle found WARNING: too long mutex cycle found WARNING: too long mutex cycle found WARNING: too long mutex cycle found WARNING: too long mutex cycle found WARNING: too long mutex cycle found WARNING: too long mutex cycle found WARNING: too long mutex cycle found WARNING: too long mutex cycle found WARNING: too long mutex cycle found != [14:13:05] : [Creating archive artifacts.tar.gz] Archive was created, file size 225.19 MB (236137468 bytes) [14:13:05] : [Publishing artifacts] Publishing 1 file using [ArtifactsCachePublisher]: results/artifacts => artifacts.tar.gz [14:13:05] : [Publishing artifacts] Publishing 1 file using [WebPublisher]: results/artifacts => artifacts.tar.gz [14:12:40] : [Step 1/1] Build build-tsan failed with exit code 1 [14:12:40]i: [Step 1/1] ##teamcity[publishArtifacts '+:results/artifacts=>artifacts.tar.gz'] [14:13:05] : [Step 1/1] [14:13:05]W: [Step 1/1] Process exited with code 1 [14:13:18]E: [Step 1/1] Process exited with code 1 (Step: Command Line)
Each failure log is accessible here:
Bitcoin ABC functional tests: feature_maxuploadtarget.py
Now this patch looks much better, but I'd like to loop back on the motivation. This is a stress test, and the purpose of a stress test is to stress the system.
Measuring on my machine, this test lasts about a minute under tsan and takes 16G of RAM. The former is not a huge deal the later maybe. But that begs the question, why are we paying engineer time to make the test worse so they can run on underpowered machines instead of spending money on more powerful machines?
src/test/util/setup_common.h | ||
---|---|---|
89 | Pass by ref instead of making copies. |
The stress test is still running full capacity on all other tests, and when running with TSAN we are especially looking for deadlocks rather than performance issues.
The point still stands, this patch was a solution I can leverage by myself to work around the issue now but I will look at if/how we can increase the machines capacity.