Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: replace memset with a loop #2465

Merged
merged 1 commit into from
Dec 14, 2023

Conversation

nurupo
Copy link
Member

@nurupo nurupo commented Dec 13, 2023

memset() treats the passed buffer as a char* array, assigning to every 1-byte of the array the value. So for a single 4-byte int32_t element, it is assigning bytes 0, 1, 2 and 3 of it to -1. It happens that -1 is 0xFF, so in the end the uint32_t is set to 0xFFFFFFFF, which is -1 in the two's complement, so the memset() actually produces the correct result in the end, assuming the platform uses two's complement integers.

Assigning it in the loop is less error-prone, as using memset() on non-1-byte wide arrays with a non-zero value is fishy, and it is more portable as we don't have to assume the use of two's complement.

On IRC, @robinlinden has pointed out that two's complement is the only integer format in C23 so perhaps we shouldn't be as concerned with the portability, but @iphy said that it's still a good idea to use a for-loop for this case.

It looks like in a future version of the C standard, C23, two's complement is the only integer format in C23 (thanks to @robinlinden on IRC for pointing that out), so perhaps we shouldn't be as concerned with the portability here? Though @iphydf says that it's still a good idea to use a for-loop for this case.


This change is Reviewable

Copy link

codecov bot commented Dec 13, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (6645343) 74.51% compared to head (55a7600) 71.60%.
Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2465      +/-   ##
==========================================
- Coverage   74.51%   71.60%   -2.92%     
==========================================
  Files          87       75      -12     
  Lines       26313    25189    -1124     
==========================================
- Hits        19607    18036    -1571     
- Misses       6706     7153     +447     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@nurupo nurupo force-pushed the replace-memset-with-a-loop branch from e90d77f to 893d49b Compare December 13, 2023 11:49
@nurupo nurupo marked this pull request as ready for review December 13, 2023 11:53
@nurupo nurupo requested a review from JFreegman December 13, 2023 12:03
Copy link
Member

@Green-Sky Green-Sky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a simple for-loop is trivial for a compiler to optimize. I would not be surprised if the compiler uses memset or similar in optimized binaries from this code.

Copy link
Member

@robinlinden robinlinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, I'm pro for-loop and portability. My comments on IRC were only about the fact that C++20 and C23 require two's complement for signed integers and unrelated to this change.

Copy link
Member

@JFreegman JFreegman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r1, all commit messages.
Reviewable status: 0 of 1 approvals obtained (waiting on @nurupo)

a discussion (no related file):
I'm fine with this change but I think in the future we need to properly define what we consider to be portable-enough code for our purposes. There are probably many obscure systems that Tox would never run on regardless of an isolated improvement like this one, making these sorts of PR's effectively pointless in the grander scheme of things. What systems that people might use Tox on don't actually use two's compliment that we're making an active effort to support? We should have the CI run tests on all the systems that we want to actively support. We don't have to require them to pass at first, but the fact that they exist and give us some sort of a tangible goal would be very helpful.

Also, could you squash the bootstrap hash commit?


@nurupo
Copy link
Member Author

nurupo commented Dec 13, 2023

@JFreegman The portability is one half of it, the other is that using memset() on non-1-byte wide arrays with a non-zero value is rather suspicious, as the programmer might have assumed that the value is assigned to every int32_t instead of every char. (This is a bit far fetched, but a person refactoring the code might read it as such too, e.g. mistakenly setting value to -2 thinking that it would work that way, assuming a hypothetical refactor requires -2 initializing the array). Now that I thought about it some more, if it was ~0 instead of -1, i.e. "all bits set", then the intention would be clearer.

@nurupo nurupo force-pushed the replace-memset-with-a-loop branch from 893d49b to fa337ca Compare December 13, 2023 23:30
@nurupo
Copy link
Member Author

nurupo commented Dec 13, 2023

Squashed.

@nurupo nurupo force-pushed the replace-memset-with-a-loop branch 2 times, most recently from 8ae6f58 to a83f6d0 Compare December 13, 2023 23:39
memset() treats the passed buffer as a char* array, assigning to every
1-byte of the array the value. So for a single 4-byte int32_t element,
it is assigning bytes 0, 1, 2 and 3 of it to -1. It happens that -1 is
0xFF, so in the end the uint32_t is set to 0xFFFFFFFF, which is -1 in
the two's complement, so the memset() actually produces the correct
result in the end, assuming the platform uses two's complement integers.

Assigning it in the loop is less error-prone, as using memset() on
non-1-byte wide arrays with a non-zero value is fishy, and it is more
portable as we don't have to assume the use of two's complement.

It looks like in a future version of the C standard, C23, two's
complement is the only integer format in C23 (thanks to @robinlinden on
IRC for pointing that out), so perhaps we shouldn't be as concerned with
the portability here? Though @iphydf says that it's still a good idea to
use a for-loop for this case.
@nurupo nurupo force-pushed the replace-memset-with-a-loop branch from a83f6d0 to 55a7600 Compare December 13, 2023 23:40
@nurupo
Copy link
Member Author

nurupo commented Dec 13, 2023

@robinlinden reworded the commit message, hopefully you like it better now. Also @'ed the correct iphy. Sorry @iphy for the email spam.

Copy link
Member

@JFreegman JFreegman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all commit messages.
Reviewable status: :shipit: complete! 3 of 1 approvals obtained (waiting on @nurupo)

@nurupo nurupo added this to the v0.2.19 milestone Dec 14, 2023
@nurupo nurupo merged commit 55a7600 into TokTok:master Dec 14, 2023
36 of 38 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants