Page MenuHomePhabricator

[Chronik-client] - Cycle through backup instances on outage
ClosedPublic

Authored by emack on Jul 16 2023, 03:54.

Details

Summary

T3236

As part of mitigating the manual interventions involved in D14096 when the supplied Chronik instance keels over, this diff introduces a new FailoverProxy class exported from failoverProxy.ts which manages both regular and websocket endpoints, as well as the logic to cycle through an array of url/wsUrl endpoints should one become non-responsive.

As part of this refactor:

  • FailoverProxy is instantiated by the ChronikClient constructor, and upon instantiation the FailoverProxy constructor uses appendWsUrls to build an array of type Endpoint[] consisting of url and wsUrls.
  • _post, _get, _callAxios and _connect functions have all been moved into this FailoverProxy class in order to centralize all network handling logic and easily track the state of endpoint iteration.
  • Whilst the array of endpoints is cycled through, the most recent working endpoint is tracked via its index in the array to ensure efficient iteration.
  • All existing and future network related functions gets passed an instance of the same FailoverProxy class to ensure the regular and websocket endpoints are always sync'ed at all times.
  • The main update to the websocket connection logic (connectWs) was to close the websocket upon the onerror call back, which allows the loop to iterate to the next wsUrl.
Test Plan

npm test and observe the initial broken chronik instance and automatic switch over to the next available instance for the blockchaininfo test and then the subsequent tests where all urls are invalid.

Diff Detail

Repository
rABC Bitcoin ABC
Branch
urlCycler
Lint
Lint Passed
Unit
No Test Coverage
Build Status
Buildable 24937
Build 49462: Build Diff
Build 49461: arc lint + arc unit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
This revision now requires changes to proceed.Sep 4 2023, 07:13
emack marked 3 inline comments as done.

Responding to feedback

modules/chronik-client/index.ts
318 ↗(On Diff #42054)

Updated comments and reordered functions so appendWsUrls (public for testing only) is right after the constructor, followed by the other public methods.

903 ↗(On Diff #42054)

Removed by mistake, reverted change.

911 ↗(On Diff #42054)

Removed export.

  • Moved FailoverProxy and the Endpoint interface into its own file failoverProxy.ts
  • The Endpoint interface is only used within failoverProxy.ts hence not declared as an export

Re-ordering unit test public function

Out of scope: the tests should run on CI

@bytesofman, @tobias_ruck - ready for your follow up review following Fabien's acceptance, pls see updated summary for reference on this refactored state.

modules/chronik-client/failoverProxy.ts
27–32 ↗(On Diff #42060)

can tighten this up a bit

62 ↗(On Diff #42060)
81 ↗(On Diff #42060)
112 ↗(On Diff #42060)

I'm not immediately understanding this line.

Say _endpointArray has two entries, this._endpointArray.length = 2

if this._workingIndex is 1, then, on the second iteration of this loop when i =1, index = 1 + 1 % 2 , = 3 ?

but then we can't access this._endpointArray[3] since this object is of length 2?

what am I missing here?

211 ↗(On Diff #42060)

same question as above

modules/chronik-client/test/test.ts
608 ↗(On Diff #42060)

note: result will need to be updated to include the invalid url if suggestion above is taken

emack marked 6 inline comments as done.

Responding to feedback

modules/chronik-client/failoverProxy.ts
27–32 ↗(On Diff #42060)

Updated to use conditional statement

62 ↗(On Diff #42060)

Updated

81 ↗(On Diff #42060)

Error() can only take in one arg, hence adding thisUrl inline to the same error message string.

112 ↗(On Diff #42060)

1 modulo 2 is actually 1, but your point still stands.

The brackets were in the wrong place. Corrected this to
index = (this._workingIndex + i) % this._endpointArray.length

i.e. for an array with two entries:

if no workingIndex (i.e. = 0), then the loop is:
i = 0, index = (0 + 0) % 2 = 0
i = 1, index = (0 + 1) % 2 = 1

if workingIndex exists (say, 1), then the loop is:
i = 0, index = (1 + 0) % 2 = 1
i = 1, index = (1 + 1) % 2 = 0 (back to start of array)

211 ↗(On Diff #42060)

Updated as above

modules/chronik-client/test/test.ts
608 ↗(On Diff #42060)

Updated to align with above error message adjustment

bytesofman requested changes to this revision.Sep 6 2023, 21:40
bytesofman added inline comments.
modules/chronik-client/failoverProxy.ts
122 ↗(On Diff #42060)

this._workingIndex = i; is correct...sometimes

if workingIndex exists (say, 1), then the loop is:
i = 0, index = (1 + 0) % 2 = 1
i = 1, index = (1 + 1) % 2 = 0 (back to start of array) [but now you are setting the working index to i, which is 1, even though this is already your working index]

expanding

image.png (299×341 px, 8 KB)

same issue as more URLs are added.

this is pretty unintuitive to do in your head -- need a couple of unit tests so we know this is getting the behavior we expect

228 ↗(On Diff #42060)
This revision now requires changes to proceed.Sep 6 2023, 21:40
emack marked an inline comment as done.
  • Corrected setting of `this._workingIndex
  • Moved this modulo index derivation logic into a separate deriveEndpointIndex function to enable unit tests, including a new setWorkingIndex function to override the index for unit tests.
  • Added unit tests for a four element array, validating index derivation order for cases with a default and custom working index.

Removed duplicate unit test grouping

Updated unit test descriptions

bytesofman added inline comments.
modules/chronik-client/test/test.ts
648 ↗(On Diff #42096)
tobias_ruck added inline comments.
modules/chronik-client/failoverProxy.ts
11 ↗(On Diff #42098)
13 ↗(On Diff #42098)
133 ↗(On Diff #42098)

What about 500 errors? If there's a reverse proxy like nginx in front of a crashed Chronik instance, it will return a 500 error

157 ↗(On Diff #42098)

If you use requestType: "get" | "post", you don't have to check if the user provided something other than 'get' or 'post'. Also maybe capitalize it:

168 ↗(On Diff #42098)

Remove this try-catch entirely, the error already bubbles up and can be handled by the caller.

Alternatively, throw a custom Error below that contains a more explanatory error.

chronik-client is a library and should avoid logging if it makes sense.

183 ↗(On Diff #42098)

Same here

190 ↗(On Diff #42098)

This try-catch is awkward, you prepare an error in ensureResponseErrorThrown, but then catch it again here to post-process it again. Instead, make it such that ensureResponseErrorThrown does everything error handling related; or simply inline ensureResponseErrorThrown here as it's only used once now.

210–211 ↗(On Diff #42098)

Migration history shouldn't be in the code but in the commit history. Instead, explain what the code actually does

213 ↗(On Diff #42098)

Move this into the loop and make it const:

const ws = new WebSocket(thisProxyWsUrl);

Unless there's a reason to keep it out the loop that I'm not seeing?

232 ↗(On Diff #42098)

Move this out of the if, this should have unexpected behavior if the user didn't provide an onConnect callback, which should be common

This revision now requires changes to proceed.Oct 26 2023, 10:54
emack marked 10 inline comments as done.

Responding to feedback. Note: the XPI url was unstable recently and started breaking the test suite. Replaced all instances of this with the working XEC version in test.ts.

modules/chronik-client/failoverProxy.ts
133 ↗(On Diff #42098)

Since the ensureResponseErrorThrown function throws a Failed getting ${path}... error upon a response status that is not 200, which would cover a 500 error, I've added an additional err.message check here for the error message from ensureResponseErrorThrown (called within _callAxios()).

157 ↗(On Diff #42098)

Updated type declaration and capitalization across the class.
Removed the now redundant else block that checked for non-GET/POST request types, although this now necessitated the use of ! on the response object to convince the ts compiler that there should not be a request type that is not GET or POST.

168 ↗(On Diff #42098)

Removed redundant try/catch block.

183 ↗(On Diff #42098)

Removed redundant try/catch block.

190 ↗(On Diff #42098)

Removed try/catch block so that ensureResponseErrorThrown is handling the error.

210–211 ↗(On Diff #42098)

Updated to describe the function

213 ↗(On Diff #42098)

Moved ws declaration into the loop as per feedback

232 ↗(On Diff #42098)

Moved this index setting out of the if as per feedback.

Re the XPI instances: Good idea anyway. I was reindexing them yesteday, that's why the tests failed.

For the test suite it would be better to have a regtest instance eventually (would also allow us to test the tx broadcast); and having that as a library would also help with testing CashTab (without mocking). But that's a future project.

modules/chronik-client/failoverProxy.ts
9 ↗(On Diff #42826)

might be a bit nit-picky, but I would've worded it this way, what do you think?

18 ↗(On Diff #42826)

again a bit picky but I like staying concise, what do you think?

18–19 ↗(On Diff #42826)

again a bit picky but I like staying concise, what do you think?

18–19 ↗(On Diff #42826)

again a bit picky but I like staying concise, what do you think?

180 ↗(On Diff #42826)

This makes me like TypeScript significantly less; I thought it should figure this out on its own. You can also add an else { throw new Error("impossible by types") } at the end if that fixes it

emack marked 5 inline comments as done.

Responding to feedback

modules/chronik-client/failoverProxy.ts
9 ↗(On Diff #42826)

Reworded per feedback

18 ↗(On Diff #42826)

Agree with the first sentence adjustment, which made the 2nd sentence redundant, hence deleted.

180 ↗(On Diff #42826)

Added the suggested error throw in an else block to prevent the TypeScript compiler going vomitron 3000. This allowed the ! to be removed from the response object.

On a separate note, I've also carried out localized usage of this diff in /apps/examples and it tested fine.
i.e.

  1. Refreshed the dist folder within /modules/chronik-client repo via npm run build
  2. Go into /apps/examples and removed current client npm uninstall chronik-client
  3. Installed this diff via npm i ../../modules/chronik-client
  4. Go into /apps/examples/getDetailsFromTxid.js, import the client via the relative path above and instantiated ChronikClient constructor with an array of 1 mispelt url (simulated outage) and 1 correctly spelt url
  5. Executed the example via npm run getDetailsFromTxid <txid> and it should cycle past the 1st url and connects successfully on the 2nd url to return the transaction details.

alright, green. sorry for holding this hostage for so long, but at least I didn't ask for ransom, so I guess it's not that bad

This revision is now accepted and ready to land.Nov 4 2023, 10:04