Page MenuHomePhabricator

util_tests.test_LockDirectory: avoid exit in forked process
AbandonedPublic

Authored by markblundeberg on Feb 9 2020, 13:20.

Details

Reviewers
Fabien
deadalnix
Group Reviewers
Restricted Project
Summary

exit(0) was causing memory errors during shutdown of this forked process, in an unreliable way (only visible on CI system).

Example: https://build.bitcoinabc.org/viewLog.html?buildId=28627

@Fabien and I found that failure was happening since --logger=JUNIT passed in to the boost test suite (D5042). The most reliable way to see the errors on exit was (with either clang or gcc compiled):

valgrind ./src/test/test_bitcoin --logger=JUNIT  -t util_tests

Since we're using fork() anyway, we might as well just run execlp() which nukes the child process image. As a bonus this stops valgrind from producing any noise about this child process.

Test Plan

ninja check

Diff Detail

Repository
rABC Bitcoin ABC
Branch
quickexit
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 9372
Build 16675: Default Diff Build & Tests
Build 16674: arc lint + arc unit

Event Timeline

markblundeberg retitled this revision from util_tests: avoid exit in forked process to util_tests.test_LockDirectory: avoid exit in forked process.Feb 9 2020, 13:21
markblundeberg edited the summary of this revision. (Show Details)

update comment

markblundeberg edited the summary of this revision. (Show Details)
deadalnix requested changes to this revision.Feb 9 2020, 20:12
deadalnix added a subscriber: deadalnix.
deadalnix added inline comments.
src/test/util_tests.cpp
1289 ↗(On Diff #16194)

According to https://en.cppreference.com/w/cpp/utility/program/exit , std::exit should run all the proper exit code. execlp probably not - no C lib call would guarantee that.

Not sure if naked exit is the same as std::exit , I assume the libc doesn't guarantee C++'s cleanup. If the problem is C++'s cleanup, then this is all good. If the goal is to exit WITHOUT cleaning up, then std::abort is what you want.

This revision now requires changes to proceed.Feb 9 2020, 20:12

update comment, second argument to execlp needs to be nonnull

src/test/util_tests.cpp
1289 ↗(On Diff #16194)

Yeah the primary problem is C++'s cleanup, specifically involving some static objects created by the Boost test suite logging subsystem. Running valgrind with the command below uncovers that there is a use-after-free during the cleanup, and the actual crashes causing problems in the CI were apparently a double-free "corrupted double-linked list" error also involving Boost cleanup, probably related. It's still unclear why forking would induce such a problem.

Using std::abort actually causes test failure since the subprocess now returns nonzero status (the unit test checks for this, BOOST_CHECK_EQUAL(processstatus, 0); -- that's why segfault in subprocess was causing overall test failure), but quick_exit does do the trick. The only thing is that this introduces a new problem - glibc+valgrind have a bug in cleanup that looks like this: https://stackoverflow.com/a/58292789/5667904 . As a side issue, valgrind also produces a memory leak report that complains about the sub process having leaked memory if it doesn't clean up.

So what I have here is the cleanest option, as odd as it is, since it basically avoids ALL memory cleanup handlers, both C and C++, and also tells valgrind to not worry about the memory leaks. It is the only option I can find that allows the following to pass test suite and not produce any valgrind complaints:

valgrind ./src/test/test_bitcoin --logger=JUNIT -t util_tests
src/test/util_tests.cpp
1289 ↗(On Diff #16194)

If quick_exit is what we want, it is still possible to use it associated with a suppression file to avoid the glibc bug 24476.
This one is working and specific enough (only applies to quick_exit inside our test case):

{
   Suppress glibc bug 24476
   Memcheck:Addr8
   ...
   fun:__dlerror_main_freeres
   fun:__libc_freeres
   ...
   fun:quick_exit*
   fun:TestOtherProcess
}

There is also a Valgrind option to not use the freeres feature at all that prevent the issue to throw: --run-libc-freeres=no but I don't know what could be the consequences when running against other pieces of code (it is advertised as causing false positives).

src/test/util_tests.cpp
1289 ↗(On Diff #16194)

Yeah the suppressor option seems most ideal for that approach. The only downside in comparison to current diff is that valgrind will still print out a nasty leak report for the subprocess, but as long as we know what is going on that's fine, and it sounds like we can add --child-silent-after-fork=yes though I haven't tested that.

src/test/util_tests.cpp
1257 ↗(On Diff #16197)

Note - this test only runs in non-WIN32 systems as seen here, and "true" is a fairly old and trustworthy command on unix systems which I imagine are the only ones that do support fork(). https://en.wikipedia.org/wiki/True_and_false_(commands)

src/test/util_tests.cpp
1289 ↗(On Diff #16194)

See D5254 for this alternative solution.