Page MenuHomePhabricator

We assume uint8_t is an alias of unsigned char.
ClosedPublic

Authored by fpelliccioni on Dec 16 2019, 18:12.

Details

Summary

char, unsigned char, and std::byte (C++17) are the only byte types according to the C++ Standard.
byte type means a type that can be used to observe an object's value representation.
We use uint8_t everywhere to see bytes, so we have to ensure that uint8_t is an alias to a byte type.

References:
http://eel.is/c++draft/basic.types
http://eel.is/c++draft/basic.memobj#def:byte
http://eel.is/c++draft/expr.sizeof#1
http://eel.is/c++draft/cstdint#syn

Test Plan
ninja check

Diff Detail

Repository
rABC Bitcoin ABC
Branch
feature-uint8_t_vs_unsigned_char
Lint
Lint Warnings
SeverityLocationCodeMessage
Warningsrc/compat/assumptions.h:42STDINT1`uint8_t should be preferred over `unsigned char``.
Warningsrc/compat/assumptions.h:43STDINT1`uint8_t should be preferred over `unsigned char``.
Warningsrc/compat/assumptions.h:51STDINT1`uint8_t should be preferred over `unsigned char``.
Unit
No Test Coverage
Build Status
Buildable 8572
Build 15134: Default Diff Build & Tests
Build 15133: arc lint + arc unit

Event Timeline

fpelliccioni edited the summary of this revision. (Show Details)
fpelliccioni edited the summary of this revision. (Show Details)
jasonbcox requested changes to this revision.Dec 16 2019, 18:15
jasonbcox added a subscriber: jasonbcox.

The change doesn't appear to match the description. Did you get bit by the linter?

This revision now requires changes to proceed.Dec 16 2019, 18:15

The change doesn't appear to match the description. Did you get bit by the linter?

Yeah, the linter made me crap and I didn't realize

Thanks for this! I was wondering about it. I have seen some discussions that people want compilers to not count uint8_t as an alias for char. This has the advantage of giving optimization, but, apparently it breaks ABI on existing defined architectures: https://stackoverflow.com/questions/26297571/how-to-create-an-uint8-t-array-that-does-not-undermine-strict-aliasing

By the way, in which parts of code do we assume uint8_t is a byte type for alasing purposes? For many parts of the code it seems to be used because the code specifically wants an octet, which makes sense.

(BTW another assumption I don't see documented here is that CHAR_BIT == 8. Just because sizeof(int) == 4, doesn't necessarily mean that it's 32 bits long.)

This isn't just a theoretical concern or thinking about PDP with its CHAR_BIT of 9 -- I wouldn't be surprised if some more future architectures choose CHAR_BIT=16 or even 32. (https://stackoverflow.com/questions/32091992/is-char-bit-ever-8)

Fabien requested changes to this revision.Dec 17 2019, 10:13

@markblundeberg see https://reviews.bitcoinabc.org/D4680 for a case that uses this assumption (not entirely true since there is a memcpy). I found some other places as well by grepping for (uint8_t *) to find the suspicious casts.

With your rational it makes sense to assert CHAR_BIT==8 as well, or we might want to switch to int32_t everywhere it's needed.

This revision now requires changes to proceed.Dec 17 2019, 10:13

@markblundeberg see https://reviews.bitcoinabc.org/D4680 for a case that uses this assumption (not entirely true since there is a memcpy). I found some other places as well by grepping for (uint8_t *) to find the suspicious casts.

With your rational it makes sense to assert CHAR_BIT==8 as well, or we might want to switch to int32_t everywhere it's needed.

Yeah I've seen a huge number of places make this assumption, which is not going to be violated any time soon (currently only some DSP chips use CHAR_BIT != 8, and posix currently guarantees CHAR_BIT == 8), but doesn't hurt to make a static assert.

This revision is now accepted and ready to land.Dec 17 2019, 19:06