The purpose of this improvement is to address the following issue: When connecting to a node that can establish a WebSocket connection (_websocketUrlConnects) but immediately throws onerror or onclose after connection,s: In any case, if there are internal failures in chronik, whether network-related, resource-related, manual operations, or others, during onerror\onclose handling, the system will continuously attempt to reconnect to this faulty URL instead of switching to another URL. This leads to two problems:
1. the system will keep attempting to reconnect to this faulty node instead of switchingHigh-frequency cyclic requests to another onethe URL, until the WebSocket connection itself fails to establishand
2. This leads to high-frequency requests to the faulty node when it's the only URL availableUsers cannot switch URLs to resolve the failure.
This issue was encountered on March 17th on chronik-native1, consuming significant resourceswhere it faced numerous reconnection requests, while also failing to properly switchdelaying chronik-native1's return to available nodes when the faulty node cannot switch correctlynormal service and requiring manual intervention to resolve.
The solution is to add a delay using setTimeout (which may also prevent stack overflow issues in extreme cases, though uncertain)
while ensuring node switching. Dynamic fallback delay calculation is used to adjust delay time based on different number of nodes.
For example, with only 1 node, it will continue trying with a 500ms delay - it won't exit but at least won't cause the
aforementioned issues. With 5 nodes, the base delay is 500/square of node count, which is 20ms, then the delay time varies according
to different this._workingIndex values (0 to 4):
When this._workingIndex = 0: 20 * 1 = 20ms
When this._workingIndex = 1: 20 * 2 = 40ms
When this._workingIndex = 2: 20 * 3 = 60ms
When this._workingIndex = 3: 20 * 4 = 80ms
When this._workingIndex = 4: 20 * 5 = 100ms
The purpose is to avoid switching nodes too quickly when all nodes are unavailable (can pass _websocketUrlConnects but actually
faulty) while maintaining retry efficiency. For example, 1 node takes 500ms, 2 nodes take 125+250=375ms, 3 nodes take
55+111+166=332ms.
This solution can solve the aforementioned issues. It allows the reconnection mechanism to function properly in various situations
without getting stuck in resource-intensive loops.