[chronik-cache] Init chronik-cache in monorepo
AbandonedPublic
Actions

Authored by alitayin on Jul 31 2025, 10:03.

Details

Reviewers

emack
bytesofman

Group Reviewers

Restricted Owners Package	(Owns No Changed Paths)
Restricted Project

Summary

This is a high-performance caching service designed to work with the Chronik Indexer, featuring both persistent and in-memory caching. chronik-cache provides a comprehensive caching layer that automatically caches and updates address histories, token transactions, and script queries in real time. It listens for on-chain changes via WebSocket and synchronizes the cache state automatically.

Dual-layer cache architecture: Persistent storage based on LevelDB + in-memory Map cache, providing millisecond-level response times
Automatic synchronization mechanism: Listens for transaction events via WebSocket and automatically updates the status of unconfirmed transactions
Intelligent pagination: Supports paginated storage and queries for large volumes of data
Multiple query types: Supports queries for address histories, SLP/eToken transactions, script conversions, and more
Failover mechanism: Built-in retry logic and fallback strategies
Performance monitoring: Provides interfaces for tracking cache status and statistics

Areas for improvement:

Better WebSocket handling: Optimize connection management and reconnection mechanisms
Browser support: Adapt for browser-side usage scenarios
Improved read/write design: Optimize storage structure and I/O performance
Add tests: Improve unit and integration test coverage

Test Plan

Here is a simple demo:example.js
A complete test suite will be added in the future, including test cases for core functions such as cache updates, WebSocket event handling, and failover mechanisms.

Diff Detail

Repository

rABC Bitcoin ABC

Branch

chronik-cache

Lint

Lint Passed

Severity	Location	Code	Message
Auto-Fix	modules/chronik-cache/tsconfig.json:34	WHITESPACE1	Found trailing whitespace(s).

Unit

No Test Coverage

Build Status

Buildable 34012
Build 67493: Build Diff
Build 67492: arc lint + arc unit

Event Timeline

alitayin created this revision.Jul 31 2025, 10:03

Owners added a reviewer: Restricted Owners Package.Jul 31 2025, 10:03

Herald added a reviewer: Restricted Project. · View Herald TranscriptJul 31 2025, 10:03

alitayin requested review of this revision.Jul 31 2025, 10:03

Harbormaster completed remote builds in B34012: Diff 55053.Jul 31 2025, 10:04

There are many areas for improvement. For example, comments and the Chinese I added. :-)

alitayin edited the summary of this revision. (Show Details)Jul 31 2025, 10:10

I'm not see any tests? Have a look at other chronik-client mocks and tests in the repo. There are also discrete methods like the txid sorting function that can be easily unit tested too.

modules/chronik-cache/.npmignore
1	replace with english comments pls
49	Your arc lint should have picked this up prior to pushing to phab. In any case, pls fix newline lint issues across a few files.
modules/chronik-cache/example.js
1	update across all file headers
5	I don't think an example js file is needed especially a 587 line one at that. Your readme is already good enough.

The title has a funny typo.

What is the purpose of the persistent cache ? How is it different from the base chronik storage ?

modules/chronik-cache/example.js
1	@emack

In D18437#420303, @Fabien wrote:

The title has a funny typo.

What is the purpose of the persistent cache ? How is it different from the base chronik storage ?

There will be a pure in-memory mode. However, between fully storage like chronik and pure in-memory, there should also be an "on-demand persistence" option. Both in the browser and in Node.js environments, this kind of flexible persistence is often needed.

In D18437#420334, @alitayin wrote:

In D18437#420303, @Fabien wrote:

The title has a funny typo.

What is the purpose of the persistent cache ? How is it different from the base chronik storage ?

There will be a pure in-memory mode. However, between fully storage like chronik and pure in-memory, there should also be an "on-demand persistence" option. Both in the browser and in Node.js environments, this kind of flexible persistence is often needed.

"Because it's often needed" doesn't answer the question though. Why is it often needed ?

In D18437#420336, @Fabien wrote:

In D18437#420334, @alitayin wrote:

In D18437#420303, @Fabien wrote:

The title has a funny typo.

What is the purpose of the persistent cache ? How is it different from the base chronik storage ?

There will be a pure in-memory mode. However, between fully storage like chronik and pure in-memory, there should also be an "on-demand persistence" option. Both in the browser and in Node.js environments, this kind of flexible persistence is often needed.

"Because it's often needed" doesn't answer the question though. Why is it often needed ?

Many use cases require repeated access to specific data. For example, the address used by paybutton, or the token used by agora. Storing all of these in memory is not a good option. Selective persistence can better complement memory usage.

In D18437#420340, @alitayin wrote:

In D18437#420336, @Fabien wrote:

In D18437#420334, @alitayin wrote:

In D18437#420303, @Fabien wrote:

The title has a funny typo.

What is the purpose of the persistent cache ? How is it different from the base chronik storage ?

There will be a pure in-memory mode. However, between fully storage like chronik and pure in-memory, there should also be an "on-demand persistence" option. Both in the browser and in Node.js environments, this kind of flexible persistence is often needed.

"Because it's often needed" doesn't answer the question though. Why is it often needed ?

Many use cases require repeated access to specific data. For example, the address used by paybutton, or the token used by agora. Storing all of these in memory is not a good option. Selective persistence can better complement memory usage.

Storing all of these in memory is not a good option

Why ? Isn't it exactly the point of a cache ?
The goal of a cache is to avoid a costly IO operation by replacing it with a less expensive one, typically store in memory to avoid a disk lookup which is orders of magnitude slower. Another example, browser cache is about replacing networking IO with a local disk lookup.
What is your persistent cache doing ? What is the rationale behind it ?

In D18437#420348, @Fabien wrote:

In D18437#420340, @alitayin wrote:

In D18437#420336, @Fabien wrote:

In D18437#420334, @alitayin wrote:

In D18437#420303, @Fabien wrote:

The title has a funny typo.

What is the purpose of the persistent cache ? How is it different from the base chronik storage ?

There will be a pure in-memory mode. However, between fully storage like chronik and pure in-memory, there should also be an "on-demand persistence" option. Both in the browser and in Node.js environments, this kind of flexible persistence is often needed.

"Because it's often needed" doesn't answer the question though. Why is it often needed ?

Many use cases require repeated access to specific data. For example, the address used by paybutton, or the token used by agora. Storing all of these in memory is not a good option. Selective persistence can better complement memory usage.

Storing all of these in memory is not a good option

Why ? Isn't it exactly the point of a cache ?
The goal of a cache is to avoid a costly IO operation by replacing it with a less expensive one, typically store in memory to avoid a disk lookup which is orders of magnitude slower. Another example, browser cache is about replacing networking IO with a local disk lookup.
What is your persistent cache doing ? What is the rationale behind it ?

Yes, the goal of caching is indeed to replace expensive operations with cheaper ones, which is exactly my design purpose.

Let me explain the design evolution process:
Initial Design: I originally designed two approaches for different environments:
Browser Environment: Persistent cache + Chronik API
Node.js Environment: Memory cache + Chronik API (hot data/cold data model)
Problem Encountered: pure memory caching faces size bottlenecks. For tokens or addresses with high transaction volumes, cached data easily reaches hundreds of MBs. While data loss after restarts isn't a primary concern (since it's infrequent), memory capacity limitations became a practical obstacle.

Therefore, I introduced the concept of "warm data" as an extension solution,, which is LevelBD

So let me sum up to check I understand: you have persistent storage on front-end and both memory and persistent on backend. So the persistent storage on backend is only useful if the chronik instance is a remote to save the network io?

modules/chronik-cache/README.md
11	The code is ts, please use it for the example as well
16	It better takes an array
20	Can you explain me this limit?
23	???
24	What is that?
46	What are these values ? Can you explain them?
55	This is a very bad idea imo because I can craft txs that would make your cache very expensive and inefficient. Ws should only mark the entry dirty to cause the next call to be a cache miss
59	That's already part of chronik?
70	How are 2, 3 and 4 different?
74	Ditto
79	I can't make sense of this one either, can you explain?
91	There shouldn't be 2 places to maintain the code, it's confusing and will inevitably go out of sync. One option is to have github as a convenience mirror
93	No need for duplicating it's already in the repo
modules/chronik-cache/example.js
5	I agree, also it's a js example for a ts library which doesn't make much sense

In D18437#420414, @Fabien wrote:

So let me sum up to check I understand: you have persistent storage on front-end and both memory and persistent on backend. So the persistent storage on backend is only useful if the chronik instance is a remote to save the network io?

Yes, when designing it in February of this year, I was considering "the situation where there's no locally running chronik node", which is why I thought of using levelDB to persist some data. However, I think the overall structure indeed has many shortcomings, so I want to have a thorough discussion on how to improve its design before starting to improve it. the purpose is not just to save resources, but rather to provide a solution for "faster access to desired data" instead from "network"

Data Access Solution Design
├── Scenario 1: Pure Frontend Solution
│ ├── Feature: No local chronik node
│ ├── Storage: levelDB persistence
│ └── Goal: Fast client-side data access
│
├── Scenario 2: Backend without Chronik Solution
│ ├── Feature: Backend service, no chronik dependency
│ ├── Storage Options:
│ │ ├── Persistent storage
│ │ └── Pure in-memory storage
│ └── Goal: Flexible data management
│
└── Scenario 3: Backend with Chronik Solution

├── Feature: Backend service with existing chronik node
├── Storage: No need for levelDB persistence
└── Goal: Direct utilization of chronik data source

modules/chronik-cache/README.md
24	This was some conceptual mistakes I made when writing an npm package for the first time. I thought about providing failover as a user-configurable option for the number of times.
46	The specific process of its usage is as follows. Suppose you access Addr1, Addr2, Addr3, and Addr4 through Chronik-Cache. It will check whether data for these four addresses exists; if not, it will use the chronik API (when there are fewer than 200 transactions per page) and asynchronously download their data at the same time. Therefore, there are several different cache statuses such as "UNKNOWN, UPDATING, LATEST." When the cache is ready, it will be fetched from levelDB into memory and provided to the user (for example, allowing 1000 transactions per page). Users can use statuses 1, 2, and 3 to determine whether the cache is ready. At the same time, Addr1 to Addr4 will be subscribed via WebSocket. If there are transactions for a period of time in the future, the local cache will be updated. If not used for a period of time, the WebSocket subscription will be disconnected. If the level exceeds the set size, it will evict the data with the lowest access frequency. But I also feel that this approach seems a bit like a "preheating", not just a cache. So I'm quite conflicted about it.
55	For example, for address Addr1, if there are no requests for this data for a period of time, the WebSocket will be dropped and it will no longer be updated until the next request comes in. If Addr1 receives a large number of requests, the WebSocket duration will increase accordingly with the number of requests.

In D18437#420414, @Fabien wrote:

So let me sum up to check I understand: you have persistent storage on front-end and both memory and persistent on backend. So the persistent storage on backend is only useful if the chronik instance is a remote to save the network io?

You're right, I was explaining the original design approach above. It does have some issues, which is why it hasn't been integrated yet.

The original solution didn't have decoupling problems, but it was more oriented towards:

How to fetch all data in one go
How to cache this data, then write a complex mechanism to ensure data consistency between levelDB and chronik
The cache is only considered "complete" and usable after all transactions for an address or token are fully retrieved
Designed specifically for a few tokens/addresses with large transaction volumes
Assumed most users are chronik API users who wouldn't deploy chronik themselves
Assumed that when a user accesses address A, all records for address A would be asynchronously fetched to form a robust, complete dataset, which inherently consumes extra resources and only generates "benefits" when address A is requested frequently
These design tendencies made it a "prewarming" tool with limited use cases.

So the new approach is:

Everything is processed in memory with no DB. Browser-side data storage uses localStorage
For small amounts of data, there's no need for persistence
For large amounts of data, users should deploy chronik rather than creating duplicate mirrors through persistence
No extra or duplicate data requests, zero redundancy, fetch on-demand, lightweight state, maintaining transparent proxy behavior - for users, it's simply a faster chronik client.
The original design is indeed very useful for "specific scenarios", and I might rename that original design, but chronik-cache needs to switch to the new approach.

I will design a new solution and then present it to you offline.

bytesofman requested changes to this revision.Aug 3 2025, 18:29

bytesofman added a subscriber: bytesofman.

bytesofman added inline comments.

modules/chronik-cache/README.md
3	It's not clear what this module does or why it should be used. the README should answer those questions High-performance caching layer how is this measured?

In D18437#420414, @Fabien wrote:

So let me sum up to check I understand: you have persistent storage on front-end and both memory and persistent on backend. So the persistent storage on backend is only useful if the chronik instance is a remote to save the network io?

My previous designs were not atomic enough; I turned them into the tools I wanted according to my preferences. This is a problem with my design, I will improve. It's just like my ecash-quicksend; the difference between it and ecash-wallet is also here: it's like an upper layer on top of ecash-wallet, you can make a tx easily, but that is all, It can't fulfill any "requirements" other than this. this is how i understand "atomic"

modules/chronik-cache/.npmignore
49	👍
modules/chronik-cache/README.md
3	I feel like I wrote too much in my previous Readme, so I tried to simplify it, and then let a translation tool translate it automatically, which turned it into "High-performance". lol. But for now, it's only effective in the "backend environment" when dealing with tokens or addresses with several thousand to tens of thousands of transactions. However, this old design needs to be changed, as mentioned in the comments above :-) and I will definitely write tests to verify this after that

In D18437#420437, @alitayin wrote:

In D18437#420414, @Fabien wrote:

So let me sum up to check I understand: you have persistent storage on front-end and both memory and persistent on backend. So the persistent storage on backend is only useful if the chronik instance is a remote to save the network io?

You're right, I was explaining the original design approach above. It does have some issues, which is why it hasn't been integrated yet.

The original solution didn't have decoupling problems, but it was more oriented towards:

How to fetch all data in one go
How to cache this data, then write a complex mechanism to ensure data consistency between levelDB and chronik
The cache is only considered "complete" and usable after all transactions for an address or token are fully retrieved
Designed specifically for a few tokens/addresses with large transaction volumes
Assumed most users are chronik API users who wouldn't deploy chronik themselves
Assumed that when a user accesses address A, all records for address A would be asynchronously fetched to form a robust, complete dataset, which inherently consumes extra resources and only generates "benefits" when address A is requested frequently
These design tendencies made it a "prewarming" tool with limited use cases.

So the new approach is:

Everything is processed in memory with no DB. Browser-side data storage uses localStorage
For small amounts of data, there's no need for persistence
For large amounts of data, users should deploy chronik rather than creating duplicate mirrors through persistence
No extra or duplicate data requests, zero redundancy, fetch on-demand, lightweight state, maintaining transparent proxy behavior - for users, it's simply a faster chronik client.
The original design is indeed very useful for "specific scenarios", and I might rename that original design, but chronik-cache needs to switch to the new approach.

I will design a new solution and then present it to you offline.

The new architecture will shift to an "on-demand caching + transparent proxy" model:
Maintain metadata (such as total transaction count numTxs) for each address/token, used as a "fingerprint" for cache validity.
Calculate the precise position of data in the cache based on pageOffset and pageSize.
Listen for new transactions via WebSocket. When numTxs changes, the cache becomes invalid.

Remove LevelDB and use in-memory caching; browser clients can optionally use localStorage.
Workflow example:

User requests page 2 of data for address A (pageOffset=1, pageSize=200)
System quickly checks and finds that the address has a total of 1337 transactions
Calculate cache position: transactions 200-399, and check whether this range is already cached
If cache miss, fetch via chronik API and place precisely in the corresponding position
WebSocket detects 3 new transactions and updates numTxs to 1340
Next time the same request is made, it detects the change in numTxs and fills in the missing 3 transactions
Cache size can be set, and the lifecycle is determined according to request volume
For the client, use localStorage. The idea is similar, also using LRU for automatic eviction.

Subscriptions also have their own management process to avoid subscribing to too much content.

In short, the core is to ensure that the obtained data maps to the correct cache slot, that its order and offset are accurate, and to achieve this with minimal resources :-) can also add UTXO, block(maybe)

In D18437#420534, @alitayin wrote:

In D18437#420437, @alitayin wrote:

In D18437#420414, @Fabien wrote:

So let me sum up to check I understand: you have persistent storage on front-end and both memory and persistent on backend. So the persistent storage on backend is only useful if the chronik instance is a remote to save the network io?

You're right, I was explaining the original design approach above. It does have some issues, which is why it hasn't been integrated yet.

The original solution didn't have decoupling problems, but it was more oriented towards:

How to fetch all data in one go
How to cache this data, then write a complex mechanism to ensure data consistency between levelDB and chronik
The cache is only considered "complete" and usable after all transactions for an address or token are fully retrieved
Designed specifically for a few tokens/addresses with large transaction volumes
Assumed most users are chronik API users who wouldn't deploy chronik themselves
Assumed that when a user accesses address A, all records for address A would be asynchronously fetched to form a robust, complete dataset, which inherently consumes extra resources and only generates "benefits" when address A is requested frequently
These design tendencies made it a "prewarming" tool with limited use cases.

So the new approach is:

Everything is processed in memory with no DB. Browser-side data storage uses localStorage
For small amounts of data, there's no need for persistence
For large amounts of data, users should deploy chronik rather than creating duplicate mirrors through persistence
No extra or duplicate data requests, zero redundancy, fetch on-demand, lightweight state, maintaining transparent proxy behavior - for users, it's simply a faster chronik client.
The original design is indeed very useful for "specific scenarios", and I might rename that original design, but chronik-cache needs to switch to the new approach.

I will design a new solution and then present it to you offline.

The new architecture will shift to an "on-demand caching + transparent proxy" model:
Maintain metadata (such as total transaction count numTxs) for each address/token, used as a "fingerprint" for cache validity.
Calculate the precise position of data in the cache based on pageOffset and pageSize.
Listen for new transactions via WebSocket. When numTxs changes, the cache becomes invalid.

Remove LevelDB and use in-memory caching; browser clients can optionally use localStorage.
Workflow example:

User requests page 2 of data for address A (pageOffset=1, pageSize=200)
System quickly checks and finds that the address has a total of 1337 transactions
Calculate cache position: transactions 200-399, and check whether this range is already cached
If cache miss, fetch via chronik API and place precisely in the corresponding position
WebSocket detects 3 new transactions and updates numTxs to 1340
Next time the same request is made, it detects the change in numTxs and fills in the missing 3 transactions
Cache size can be set, and the lifecycle is determined according to request volume
For the client, use localStorage. The idea is similar, also using LRU for automatic eviction.

Subscriptions also have their own management process to avoid subscribing to too much content.

In short, the core is to ensure that the obtained data maps to the correct cache slot, that its order and offset are accurate, and to achieve this with minimal resources :-) can also add UTXO, block(maybe)

You don't need metadata, and you should start with memory only. Offloading to disk via the LRU algo can be done later if needed which I doubt will ever be. Also local storage is browser side so it's never gonna be useful and you can avoid it.

The transparent proxy (aka the cache extends chronik-client and uses the same api) is good and indicates it should be part of the client.

You want to keep it as simple and useful as possible: I would go with a read through cache design that is marked as dirty by the ws notifications, + LRU for limiting the memory usage.

modules/chronik-cache/README.md
46	This is something you can remove entirely. Not only this changes the chronik pagination behavior, making it difficult for apps to add the cache transparently but also if the data is in the cache having several calls is very cheap and won't make any difference

As discussed offline, putting the cache in chronik-client is a better choice. In addition, the initial solution of this diff has some flaws, and the new architecture should not continue here, So let's abandon it.