Page MenuHomePhabricator

[Cashtab] Full refactor of the OP_RETURN parsing logic to revolve around bytecounts
ClosedPublic

Authored by emack on Dec 3 2021, 23:31.

Details

Summary

The current OP_RETURN parsing logic does not properly parse long messages, where the PUSH DATA 1 byte (4c) is inserted prior to the byte count of the subsequent message. This is leading to extra '?' symbols when the messages are displayed on the frontend.

This diff carries out a full refactor of this logic.

In Summary:

  • A single new parseOpReturn function added to Ticker.js which replaces all the previous context specific parsing functions (extractCashtabMessage, isExternalMessage...etc).
  • This new function iterates through the OP_RETURN hex message from beginning to end (left to right).
  • The first half of the loop in each iteration checks the bytecount and converts the base16 hex into base 10 decimal. If the bytecount is '4c', then this is deemed to be a long message where the next byte is analyzed as the bytecount for the subsequent message.
  • The second half of the loop parses only as far as indicated by the preceding bytecount and extracts the message.
  • The full parsed results are added to the the resultArray[], which is structured as follows:

>index 0 for the transaction type (i.e. Cashtab message, external message, eToken)
>index 1 for the actual cashtab message
>index 2...n for any additional message parts as part of an external protocol.

  • parseTx() function in useBCH.js then takes this array and passes the appropriate decoded hex messages to Tx.js for rendering on the frontend.
Test Plan
  • npm start
  • send a long and short message to wallet from within cashtab
  • send a long and short message from electrum
  • send an eToken transaction
  • ensure no unexpected symbols in all instances and that source specific labelling is shown (e.g. cashtab vs external message label)

Diff Detail

Repository
rABC Bitcoin ABC
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

emack requested review of this revision.Dec 3 2021, 23:31
bytesofman requested changes to this revision.Dec 4 2021, 00:10
bytesofman added inline comments.
web/cashtab/src/components/Common/Ticker.js
37 ↗(On Diff #31260)

For values that have specific roles in OP_RETURN, let's use them

https://en.bitcoin.it/wiki/Script

e.g. here:

image.png (33×940 px, 9 KB)

So, let's call it opReturn.opPushDataOne instead of opReturn.longMessagePrefix

161 ↗(On Diff #31260)

I think it's worth doing a bigger refactor here

Right now, workflow from useBCH.js lines 156-186

if (isEtokenOutput(hex)) {
                        // this is an eToken transaction
                        tokenTx = true;
                    } else if (isCashtabOutput(hex)) {
                        // this is a cashtab.com generated message
                        try {
                            substring = extractCashtabMessage(hex);
                            opReturnMessage = Buffer.from(substring, 'hex');
                            isCashtabMessage = true;
                        } catch (err) {
                            // soft error if an unexpected or invalid cashtab hex is encountered
                            opReturnMessage = '';
                            console.log(
                                'useBCH.parsedTxHistory() error: invalid cashtab msg hex: ' +
                                    substring,
                            );
                        }
                    } else {
                        // this is an externally generated message
                        try {
                            substring = extractExternalMessage(hex);
                            opReturnMessage = Buffer.from(substring, 'hex');
                        } catch (err) {
                            // soft error if an unexpected or invalid cashtab hex is encountered
                            opReturnMessage = '';
                            console.log(
                                'useBCH.parsedTxHistory() error: invalid external msg hex: ' +
                                    substring,
                            );
                        }
                    }

So what we have right now is: if it's a token, do something, else if it's a cashtab tx, do something, else if it's something else, do something

Instead, it should be

if token, do something; else, parse it

So instead of having different functions extractExternalMessage and extractCashtabMessage, there should just be one function, parseOpReturn

parseOpReturn should iterate through the OP_RETURN hex message from beginning to end (left to right).

  1. pop off first byte 6a
  2. Get byte count for first message (If second byte is 4c, then third byte is byteCount, otherwise second byte is byteCount)
  3. Parse first message (however many bytes byteCount told you to expect). If first message happens to === 4 bytes and there is a message after it, then the first message is an app prefix, flag it as such
  4. Get byte count and msg for 2nd through nth msgs, adding double line breaks between them
  5. Match the prefix with anything that may be in appPrefixesHex; if no matches, display it as Unknown Prefix <prefix>

Some unit tests that parseOpReturn should pass:

  1. Return a Cashtab message with Cashtab Message flagged and no leading '?' for a message without the 4c byte
  2. Return a Cashtab message with Cashtab Message flagged and no leading '?' for a message with the 4c byte
  3. Return a message with a valid 4 byte prefix flagged with that prefix, so 0x0x0x0x Message instead of Cashtab Message
  4. Return a 4 byte message that is not a prefix as a prefixless 4 byte msg ("External Message")
  5. Return a prefixless 0-3 byte message
  6. Return a prefixed 0-3 byte message
  7. Return an n-part message that has multiple byte counts, for example a message that for whatever reason is constructed like

6a + 04 + validPrefix = 0x0x0x0x + 4c + here comes a 21 byte msg + 21 byte msg + 4c + here comes a 16 byte msg + 16 byte msg

For now, parse this type of thing as a double line break, so the above would look like

21 byte msg <br /><br />
16 byte msg

This revision now requires changes to proceed.Dec 4 2021, 00:10
emack marked an inline comment as done.

Refactored the OP_RETURN parsing logic to parse based on bytecount

emack retitled this revision from [Cashtab] Revised OP_RETURN parsing logic for extra encoding denoting long msgs to [Cashtab] Full refactor of the OP_RETURN parsing logic to revolve around bytecounts.Dec 5 2021, 05:41
emack edited the summary of this revision. (Show Details)
emack edited the test plan for this revision. (Show Details)
emack edited the summary of this revision. (Show Details)

Removed unnecessary loop break in eToken parsing logic

bytesofman requested changes to this revision.Dec 5 2021, 16:53

Good start but I'm a little unclear on some parts.

Goal is to be able to read any arbitrary OP_RETURN string. The prefix may or may not exist. If the message does include a prefix/Protocol ID --- i.e. if the message is in the format of 6a + 04 + 0x0x0x0x + [optional 4c] + byteCount + msg, then 0x0x0x0x is the prefix / Protocol ID.

If the message is of the format 6a + [optional 4c] + byteCount + msg, then there is no prefix / no Protocol ID. The differentiating factor here is the ... + [optional 4c] + byteCount + msg coming after the prefix; since otherwise the msg could be a 4-byte message with no prefix (for now, assume this case, versus the case of a msg that is nothing but a protocol ID).

web/cashtab/src/components/Common/Ticker.js
41 ↗(On Diff #31268)

I'm not sure on the purpose of this. We do not want to be appending arbitrary prefixes. If the app has no prefix or if the app has a prefix that Cashtab does not recognize, then it is an unknown app. But we don't want to modify the hex string of that message.

113 ↗(On Diff #31268)

Should confirm the first byte is indeed 6a. I'm not aware of possible exceptions, but might as well throw them out here; can just do this check in the same pre-validation function above, returning false if first byte is not 6a (indicating the string is not an OP_RETURN msg so parsing should not happen).

136 ↗(On Diff #31268)

In this case, the eToken prefix is the Protocol ID --- I'm not sure what index 0 is reserved for

142 ↗(On Diff #31268)

Shouldn't it be resultArray[0] = currency.opReturn.appPrefixesHex.cashtab ?

Please explain the parsing logic, I'm missing something here.

web/cashtab/src/hooks/useBCH.js
158 ↗(On Diff #31268)

Unclear why this is [1] and not [0]

This revision now requires changes to proceed.Dec 5 2021, 16:53
emack marked 3 inline comments as done.
  • additional check for the first byte being 6a
  • resultArray now revised to consist of:

etoken: index 0 is the prefix, nothing thereafter
cashtab message: index 0 is the cashtab prefix, index 1 is the full message content
external message: index 0 is the message content, index 1 to n are the additional messages if exists

bytesofman requested changes to this revision.Dec 6 2021, 14:37
bytesofman added inline comments.
web/cashtab/src/components/Common/Ticker.js
147 ↗(On Diff #31270)

To improve code readability, create a variable for msgByteSize * 2

i.e.

let msgCharLength = 2 * msgByteSize
hexStr = hexStr.slice(msgCharLength)
web/cashtab/src/components/Common/__tests__/Ticker.test.js
141 ↗(On Diff #31270)

This prefix no longer exists, unit tests need to be refactored

web/cashtab/src/hooks/useBCH.js
198 ↗(On Diff #31270)

Shimming in these <br/> html tags is going to create issues when the app attempts to render in Tx.js

Would be pretty involved to actually get this double line break rendered. So, don't worry about it in this diff. If and when we have apps out there that use this often, we can create a task to handle better rendering.

For now, use something that can be just stuck into the string as a visual cue. Maybe [_]

To test it, could try parsing some eToken transactions. You'll need to modify the function to actually parse the tx in local testing. See that they are rendered in tx history like SLP[_]SEND[_]XXX ...

This revision now requires changes to proceed.Dec 6 2021, 14:37
emack marked 2 inline comments as done.
  • refactored unit tests for external messages in Ticker.test.js
  • improved code readability in Ticker.js
  • Locally tested the parsing of a sample eToken tx all ok. The suggested method of inserting '[_]' between messages causes issues for hex decoding, so I've verified by printing out the array contents onto the console log.

e.g. for https://explorer.be.cash/tx/ef5674832011f0cba6a6adcda21c6acd18390844ef6ff16886122295d759ac0b
see the SLP hex and token ID matching correctly.

image.png (118×820 px, 23 KB)

To replicate this local test:

  1. comment out the eToken break; statement in Ticker.js line 138
  2. add the following loop into useBCH.js line 169 inside the 'if txType === eToken' logic

parsedOpReturnArray.forEach(function (item, index) {

console.log('Item: ' + item, 'Index: ' + index);

});

bytesofman requested changes to this revision.Dec 8 2021, 15:34
bytesofman added inline comments.
web/cashtab/src/components/Common/Ticker.js
35 ↗(On Diff #31297)

hm let's get rid of this, don't need the same value saved in two different ways here. opReturnAppPrefixLengthHex is more descriptive, so keep that one

124 ↗(On Diff #31297)

This is converting from hex (base16) to decimal (base 10); correct the comment

128 ↗(On Diff #31297)

Same comment issue

133 ↗(On Diff #31297)

I said let earlier but this should be const

135 ↗(On Diff #31297)

Should only be checking for prefix matching if i === 0

139 ↗(On Diff #31297)

Should only be checking for prefix matching if i === 0

web/cashtab/src/components/Common/__tests__/Ticker.test.js
116 ↗(On Diff #31297)

These unit tests should test against the entire returned array, and not just individual indices from it. It's confusing for code readability and limiting the domain of the unit test.

Please include mock outputs of resultArrays

This revision now requires changes to proceed.Dec 8 2021, 15:34
emack marked 6 inline comments as done.

Added full unit test coverage for parseOpReturn() function throughputs

emack edited the test plan for this revision. (Show Details)
This revision is now accepted and ready to land.Dec 13 2021, 19:56