-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace kstream::to_string() - not just int
, fix portability
#50
Conversation
Confirmed by https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html: $ cat neg_long_min.cpp
#include <iostream>
#include <limits>
int main() {
long number = std::numeric_limits<long>::min();
std::cout << -number << std::endl;
}
$ clang++ -fsanitize=undefined neg_long_min.cpp
$ ./a.out
neg_long_min.cpp:5:15: runtime error: negation of -9223372036854775808 cannot be represented in type 'long'; cast to an unsigned type to negate this value to itself
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior neg_long_min.cpp:5:15 in
-9223372036854775808 |
Hi all! First, I must embarrassingly admit, I'm now kind of rusty on C++ and templates...I've been working in Python for the past couple of years (how time flies!...). But, with that said, there's no reason not to use templates, if they are needed and they solve the problem. However, along those lines, I'm wondering if they are needed. That is, suppose Also, So, with all that as background, I'm wondering if an approach like this would work:
I think this approach removes the need for templating for I'm thinking that this will be more portable and much less reliant on fancy template stuff, which might make C++98 support easier. (But, obviously, you all have delved much more deeply into this stuff than I have, so there might be something big that I'm missing.) |
@generalmimon, is there still a problem with C++98 with your implementation or not? Your description reads to me like it works because of an implemented workaround. If there's no real problem right now, your implementation works in the tests and seems to pretty much reflect what others use in the web according your research and provided discussions, I suggest simply using that. Any other implementation 1. needs to be created at all, 2. needs to work like the current one, 3. needs to provide some benefits and 4. needs to be understood by others as well. The only benefit I see is better C++98 support in theory(!), which doesn't seem worth it in my opinion. C++98 support won't be there forever anyway. OTOH, a totally different approach than what has been researched as done by many other people in similar cases always introduces a risk of doing things wrong. |
@ams-tschoening, my principal comment on this PR is that the proposed change seems like it might be more complex than it needs to be and that the added complexity is coming from the C++ support (i.e., the template magic) and especially from the additional code required to support C++98. The fundamental part of this change (i.e., But, if you're comfortable with the understandability and maintainability of the proposed code, I will happily withdraw my suggestion. |
2ff296b
to
819e595
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
The cast in static_cast<char>('0' + number % 10)
is a necessary evil because you want the result of the expression to be a char
but one of the operands (number
) is an uint64_t
. You could presumably apply the cast to just the result of the modulo operation, but I don't think that improves the readability of the code.
The expression, -static_cast<uint64_t>(val)
is pretty scary (what does it really mean to negate an unsigned value??), but if the standards guys say that it has well-defined behavior that does what we want, I guess it's OK. 😁
The net result of this change is pretty effective and pretty minimal, so it looks good!
We only check `std::numeric_limits<I>::max()`, because `min()` is negative for signed types, and we need its absolute value (so we can compare it with `std::numeric_limits<uint64_t>::max()`). However, this value won't fit into the same signed type in two's complement representation, so we need to use at least an unsigned type of the same width - but in GCC, `std::make_unsigned<__int128_t>` is not available, and `uintmax_t` is `uint64_t`, not `__uint128_t`. I don't know how to get the "real" `uintmax_t` including implementation-specific extended types like `__uint128_t` - maybe there's no platform-independent way to do it. Fortunately, we don't have to deal with this for the `max()` case at all, which works just fine, so we'll at least check that.
@webbnh @ams-tschoening Thanks for your comments! |
See #8 (comment)
Fix #28, close #39
Cc @GreyCat, @webbnh, @ams-tschoening
Current
kaitai::kstream::to_string
implementation only works forint
and usessnprintf
, which is not portable, as pointed out in #28. Also, according to the benchmark at https://www.zverovich.net/2020/06/13/fast-int-to-string-revisited.html,sprintf
is almost the slowest function that was tested there (slower were onlyboost::format
and on macOSstd::ostringstream
).Therefore, I chose the simple
decimal_from
function as a replacement (also included in the referred benchmark) by Alf P. Steinbach (@alf-p-steinbach), which scores well in the benchmark (not so important, but for a good feeling), but more importantly is elegant and portable (as far as I can say). I've addedstatic_cast<char>
around('0' + number % 10)
, because without it I get this warning in CLion:(if the parameter has type
unsigned long
). There are other instances of this in the runtime library, but I'll deal with them later.Also, if I wasn't missing something, this part in the original code (https://ideone.com/nrQfA8) looked like an undefined behavior for
number = LONG_MIN
to me (if signed integers use two's complement):If
LONG_MIN
is-0x8000_0000
(which is< 0
),-number
overflows, because0x8000_0000
cannot be represented in thelong
type (theLONG_MAX
would be0x7fff_ffff
). And signed integer overflow is undefined behavior in C++. From what I've found, it really is UB - see https://stackoverflow.com/a/8917528. Although-LONG_MIN
sometimes yieldsLONG_MIN
again (which is what the original code apparently relies on), so it looks like it works, it's not guaranteed.Loosely related (because these wild manifestations seem to be specific to division operations), things like
INT_MIN / -1
orINT_MIN % -1
do all kinds of stuff that one wants to avoid: https://kqueue.org/blog/2012/12/31/idiv-dos/, https://www.quora.com/Why-does-INT_MIN-1-evaluate-to-0-in-C-in-an-x86-64-Linux-machineI've added the functions as templates - this causes some trouble in C++98 that we still support (I had to imitate
std::make_unsigned
with a crude implementation and drop checks whether the type parameter makes sense), but overall I hope that it makes sense. I haven't used C++ templates before, so I would really appreciate some review. I've ran our test suite for both C++98 and C++11 targets and it seems to work, but I can't be sure there aren't some pitfalls.I was thinking if we should use templates for this at all, but without them, there is still no single "one size fits all" type. We would need multiple versions - at least a version for type
int64_t
anduint64_t
(because onlyint64_t
can represent numbers from-7fff_ffff_ffff_ffff
to-1
and onlyuint64_t
can store values from0x8000_0000_0000_0000
to0xffff_ffff_ffff_ffff
). At first, I thought I will handle these variants with macros, but that feels hacky. Templates are intuitively a better choice.