Page MenuHomePhabricator

[Chronik] Fix: "Segmentation fault" during shutdown
ClosedPublic

Authored by tobias_ruck on May 5 2023, 06:58.

Details

Reviewers
Fabien
PiRK
Group Reviewers
Restricted Project
Commits
rABC27dd0b084c8c: [Chronik] Fix: "Segmentation fault" during shutdown
Summary

The position of chronik::Stop() makes it so the indexer might still process freed memory. Moving it after GetMainSignals().FlushBackgroundCallbacks() fixes this.

Test Plan

The functional test should fail if the chronik::Stop() is not moved as in this diff, and should pass if chronik::Stop() is moved as in this diff.

Alternatively, this should also reproduce the bug:

  1. ninja
  2. IBD for a few minutes
  3. Stop the node: Segmentation fault

Event Timeline

Diff for checking if the test actually fails with a segfault

@bot build-linux32 build-linux64 build-linux-aarch64 build-linux-arm build-osx build-win64

Failed tests logs:

====== Bitcoin ABC functional tests: chronik_shutdown.py ======

------- Stdout: -------
2023-05-26T16:05:52.073000Z TestFramework (INFO): Initializing test directory /work/abc-ci-builds/build-linux-aarch64/test/tmp/test_runner_₿₵_  _20230526_155653/chronik_shutdown_233
2023-05-26T16:07:01.321000Z TestFramework.p2p (WARNING): Connection lost to 127.0.0.1:20108 due to [Errno 32] Broken pipe
2023-05-26T16:08:02.895000Z TestFramework (INFO): Stopping nodes
[node 0] Cleaning up leftover process
------- Stderr: -------
Traceback (most recent call last):
  File "/work/test/functional/chronik_shutdown.py", line 76, in <module>
    ChronikShutdown().main()
  File "/work/test/functional/test_framework/test_framework.py", line 165, in main
    exit_code = self.shutdown()
  File "/work/test/functional/test_framework/test_framework.py", line 391, in shutdown
    self.stop_nodes()
  File "/work/test/functional/test_framework/test_framework.py", line 640, in stop_nodes
    node.wait_until_stopped()
  File "/work/test/functional/test_framework/test_node.py", line 548, in wait_until_stopped
    wait_until_helper(
  File "/work/test/functional/test_framework/util.py", line 285, in wait_until_helper
    if predicate():
  File "/work/test/functional/test_framework/test_node.py", line 537, in is_node_stopped
    assert return_code == 0, self._node_msg(
AssertionError: [node 0] Node returned non-zero exit code (-11) when stopping

Each failure log is accessible here:
Bitcoin ABC functional tests: chronik_shutdown.py

Failed tests logs:

====== Bitcoin ABC functional tests: chronik_shutdown.py ======

------- Stdout: -------
2023-05-26T16:09:13.284000Z TestFramework (INFO): Initializing test directory /work/abc-ci-builds/build-linux-arm/test/tmp/test_runner_₿₵_  _20230526_155924/chronik_shutdown_233
2023-05-26T16:11:19.719000Z TestFramework.utils (ERROR): wait_until() failed. Predicate: ''''
                    lambda: rpc.getmempoolinfo()["loaded"],
'''
2023-05-26T16:11:19.720000Z TestFramework (ERROR): Assertion failed
Traceback (most recent call last):
  File "/work/test/functional/test_framework/test_framework.py", line 142, in main
    self.run_test()
  File "/work/test/functional/chronik_shutdown.py", line 72, in run_test
    self.restart_node(0, ["-chronik", "-reindex"])
  File "/work/test/functional/test_framework/test_framework.py", line 645, in restart_node
    self.start_node(i, extra_args)
  File "/work/test/functional/test_framework/test_framework.py", line 603, in start_node
    node.wait_for_rpc_connection()
  File "/work/test/functional/test_framework/test_node.py", line 366, in wait_for_rpc_connection
    wait_until_helper(
  File "/work/test/functional/test_framework/util.py", line 298, in wait_until_helper
    raise AssertionError(
AssertionError: Predicate ''''
                    lambda: rpc.getmempoolinfo()["loaded"],
''' not true after 60.0 seconds
2023-05-26T16:11:19.773000Z TestFramework (INFO): Stopping nodes
[node 0] Cleaning up leftover process
------- Stderr: -------
Traceback (most recent call last):
  File "/work/test/functional/chronik_shutdown.py", line 76, in <module>
    ChronikShutdown().main()
  File "/work/test/functional/test_framework/test_framework.py", line 165, in main
    exit_code = self.shutdown()
  File "/work/test/functional/test_framework/test_framework.py", line 391, in shutdown
    self.stop_nodes()
  File "/work/test/functional/test_framework/test_framework.py", line 636, in stop_nodes
    node.stop_node(wait=wait, wait_until_stopped=False)
  File "/work/test/functional/test_framework/test_node.py", line 503, in stop_node
    self.stop(wait=wait)
  File "/work/test/functional/test_framework/test_node.py", line 279, in __getattr__
    assert self.rpc is not None, self._node_msg("Error: RPC not initialized")
AssertionError: [node 0] Error: RPC not initialized

Each failure log is accessible here:
Bitcoin ABC functional tests: chronik_shutdown.py

use stop_nodes instead of restart_node, same segfault but restart doesn't time out on ARM

@bot build-linux32 build-linux64 build-linux-aarch64 build-linux-arm build-osx build-win64

Failed tests logs:

====== Bitcoin ABC functional tests: chronik_shutdown.py ======

------- Stdout: -------
2023-05-27T09:33:24.051000Z TestFramework (INFO): Initializing test directory /work/abc-ci-builds/build-linux64/test/tmp/test_runner_₿₵_  _20230527_092900/chronik_shutdown_233
2023-05-27T09:34:28.441000Z TestFramework (ERROR): Assertion failed
Traceback (most recent call last):
  File "/work/test/functional/test_framework/test_framework.py", line 142, in main
    self.run_test()
  File "/work/test/functional/chronik_shutdown.py", line 72, in run_test
    self.stop_nodes()
  File "/work/test/functional/test_framework/test_framework.py", line 640, in stop_nodes
    node.wait_until_stopped()
  File "/work/test/functional/test_framework/test_node.py", line 548, in wait_until_stopped
    wait_until_helper(
  File "/work/test/functional/test_framework/util.py", line 285, in wait_until_helper
    if predicate():
  File "/work/test/functional/test_framework/test_node.py", line 537, in is_node_stopped
    assert return_code == 0, self._node_msg(
AssertionError: [node 0] Node returned non-zero exit code (-11) when stopping
2023-05-27T09:34:28.500000Z TestFramework (INFO): Stopping nodes
[node 0] Cleaning up leftover process
------- Stderr: -------
Traceback (most recent call last):
  File "/work/test/functional/chronik_shutdown.py", line 76, in <module>
    ChronikShutdown().main()
  File "/work/test/functional/test_framework/test_framework.py", line 165, in main
    exit_code = self.shutdown()
  File "/work/test/functional/test_framework/test_framework.py", line 391, in shutdown
    self.stop_nodes()
  File "/work/test/functional/test_framework/test_framework.py", line 636, in stop_nodes
    node.stop_node(wait=wait, wait_until_stopped=False)
  File "/work/test/functional/test_framework/test_node.py", line 503, in stop_node
    self.stop(wait=wait)
  File "/work/test/functional/test_framework/coverage.py", line 47, in __call__
    return_val = self.auth_service_proxy_instance.__call__(*args, **kwargs)
  File "/work/test/functional/test_framework/authproxy.py", line 176, in __call__
    response, status = self._request(
  File "/work/test/functional/test_framework/authproxy.py", line 125, in _request
    self.__conn.request(method, path, postdata, headers)
  File "/usr/lib/python3.9/http/client.py", line 1255, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.9/http/client.py", line 1301, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.9/http/client.py", line 1250, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.9/http/client.py", line 1010, in _send_output
    self.send(msg)
  File "/usr/lib/python3.9/http/client.py", line 950, in send
    self.connect()
  File "/usr/lib/python3.9/http/client.py", line 921, in connect
    self.sock = self._create_connection(
  File "/usr/lib/python3.9/socket.py", line 843, in create_connection
    raise err
  File "/usr/lib/python3.9/socket.py", line 831, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

Each failure log is accessible here:
Bitcoin ABC functional tests: chronik_shutdown.py

Add the fix, now builds should all succeed

@bot build-linux32 build-linux64 build-linux-aarch64 build-linux-arm build-osx build-win64

This revision is now accepted and ready to land.May 28 2023, 05:52