The purpose of this improvement is to address the following issues: In any case, if there are internal failures in chronik, whether network-related, resource-related, manual operations, or others, during onerror\onclose handling, the system will continuously attempt to reconnect to this faulty URL instead of switching to another URL. This leads to two problems:
1. High-frequency cyclic requests to the URL, and
2. Users cannot switch URLs to resolve the failure.
This issue was encountered on March 17th on chronik-native1,Removed the previous delay handling. wheOnly ensure it faced numerous reconnection requests,n this diff that faulty urls can be switched correctly. delaying chronik-native1's return to normal service and requiring manual intervention to resolveThe improvements to the WsEndpoint object mentioned in the diff will be handled in future diffs.
The solution is to add a delay using setTimeout (which may also prevent stack overflow issues in extreme cases, though uncertain)
while ensuring node switching. Dynamic fallback delay calculation is used to adjust delay time based on different number of nodes.
For example, with only 1 node, it will continue trying with a 500ms delay - it won't exit but at least won't cause the
aforementioned issues. With 5 nodes**before modification**: When a WebSocket connection to a URL is terminated due to a fault, the base delay is 500/square of node counttriggering the onclose event callback function, which is 20msthe system automatically attempts to reconnect to this closed url for failover, then the delay time varies according
to different this._workingIndex values (0 to 4):which may not be successful and poses a risk of recursion.
When this._workingIndex = 0: 20 * 1 = 20ms
When this._workingIndex = 1: 20 * 2 = 40ms
When this._workingIndex = 2: 20 * 3 = 60ms
When this._workingIndex = 3: 20 * 4 = 80ms
When this._workingIndex = 4: 20 * 5 = 100ms**After modification**: When a WebSocket connection to a URL is terminated due to a fault, triggering the onclose event callback function, the system automatically attempts to connect to the next available url for failover.
The purpose is to avoid switching nodes too quickly when all nodes are unavailable (can pass _websocketUrlConnects but actually
faulty) while maintaining retry efficiency. For example, 1 node takes 500ms, 2 nodes take 125+250=375ms, 3 nodes take
55+111+166=332ms.
This solution can solve the aforementioned issues. It allows the reconnection mechanism to function properly in various situations
without getting stuck in resource-intensive loops.