Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'convert' is broken for edge cases #2055

Closed
3 tasks done
elBoberido opened this issue Oct 23, 2023 · 13 comments · Fixed by #2135, #2152 or #2155
Closed
3 tasks done

'convert' is broken for edge cases #2055

elBoberido opened this issue Oct 23, 2023 · 13 comments · Fixed by #2135, #2152 or #2155
Labels
bug Something isn't working good first issue Good for newcomers refactoring Refactor code without adding features

Comments

@elBoberido
Copy link
Member

elBoberido commented Oct 23, 2023

Required information

Operating system:
all

Compiler version:
all

Eclipse iceoryx version:
all

Observed result or behaviour:
Conversions from strings like nan, inf or DBL_MAX for floating point types or ULLONG_MAX for uint64_t fail.

Expected result or behaviour:
All valid values should be converted.

Conditions where it occurred / Performed steps:
Try to convert inf to double or 18446744073709551615 to uint64_t

Detailed description of the problem

The conversion functions have some weird way to tell the user something went wrong. Let's take strtoull for example

unsigned long long int strtoull(const char *nptr, char **endptr,  int base);

There are three scenarios:

  1. nptr == endptr, the conversion failed because there were no valid characters
  2. nptr != endptr && *endptr != '\0', the conversion was (probably) successful and it depends on the user if this is correct because not all characters were converted, e.g. if the user is passed 123ab then *endptr would be a but if the user expected the whole string to be a number, this should be an error
  3. nptr != endptr && *endptr != '\0' , the conversion was (probably) successful

2 and 3 might still have failed if the return value was ULLONG_MAX and the errno is set to ERANGE.

All of this has to be taken care of or the conversion will fail on edge cases

Test program

This small test program can be used to observe the behavior for string to integer and string to floating point conversion.

#include <cstdlib>
#include <iostream>

int main() {
    struct Test {
        const char* name;
        const char* value;
    };

    const Test integer_tests[] {{"ZERO", "0"},
                                {"ULLONG_MAX", "18446744073709551615"},
                                {"ULLONG_MAX + 1", "18446744073709551616"},
                                {"ULLONG_MAX + ascii", "18446744073709551615q"},
                                {"ULLONG_MAX + 1 + ascii", "18446744073709551616q"},
                                {"String 'COFFEE'", "COFFEE"}
                               };

    std::cout << "#### Integer" << std::endl;
    for(const auto& test: integer_tests) {
        std::cout << "==== " << test.name << std::endl;
        char* end{nullptr};
        errno = 0;
        auto retval = strtoull(test.value, &end, 10);
        std::cout << "tst val: " << test.value << std::endl;
        std::cout << "ret val: " << retval << std::endl;
        std::cout << "errno: " << errno << std::endl;
        std::cout << "val ptr: "<< std::hex << static_cast<const void*>(test.value) << std::dec << std::endl;
        std::cout << "end ptr: " << std::hex << static_cast<void*>(end) << std::dec << " | " << static_cast<uint16_t>(*end) << " | " << end << std::endl;
        std::cout << std::endl;
    }


    const Test floating_point_tests[] {{"PI", "3.14"},
                                       {"DBL_MAX", "1.79769e+308"},
                                       {"nan", "nan"},
                                       {"inf", "inf"},
                                       {"Too large", "2e411234535345"},
                                       {"String 'COFFEE'", "COFFEE"}
                                      };

    std::cout << "#### Floating point" << std::endl;
    for(const auto& test: floating_point_tests) {
        std::cout << "==== " << test.name << std::endl;
        char* end{nullptr};
        errno = 0;
        auto retval = strtod(test.value, &end);
        std::cout << "tst val: " << test.value << std::endl;
        std::cout << "ret val: " << retval << std::endl;
        std::cout << "errno: " << errno << std::endl;
        std::cout << "val ptr: "<< std::hex << static_cast<const void*>(test.value) << std::dec << std::endl;
        std::cout << "end ptr: " << std::hex << static_cast<void*>(end) << std::dec << " | " << static_cast<uint16_t>(*end) << " | " << end << std::endl;
        std::cout << std::endl;
    }
}

Tasks

  • refactor code and make current tests pass
  • additional tests for edge cases
  • determine whether the subnormal float behavior should be unified; currently MSVC converts subnormal floats but GCC and clang does not
@elBoberido elBoberido added bug Something isn't working good first issue Good for newcomers refactoring Refactor code without adding features labels Oct 23, 2023
@elBoberido
Copy link
Member Author

@Dennis40816 The question is whether to only fix this issue or to overhaul the whole conversion class since it still converts to std::string which we eventually want to get rid of. But I think that can be done in a separate issue since the code to convert from a string is completely independent from conversion into a string. For the latter we can reuse what is already be done for the logger.

@elBoberido
Copy link
Member Author

Oh, and be warned, this can be a sisyphean task to get everything right :)

@Dennis40816
Copy link
Contributor

@elBoberido, as you've suggested, let's prioritize fixing the fromString method before addressing the toString method. To start, we should first clarify and define the specific issues within the fromString method.

The current fromString implementation uses IOX_POSIX_CALL to invoke functions such as strtof. Based on your previous description, there are two points to consider:

  1. strtof needs to work in conjunction with errno for accurate determination.
  2. It's also necessary to verify that endptr is at the correct termination point to avoid situations where characters appear after the number.

Discussion:

  1. In my opinion, errno should be checked within fromString.

  2. Should we create a pointer variable initialized with nullptr and put it in the second argument of IOX_POSIX_CALL?

    char* should_be_end = nullptr;
    
    auto call = IOX_POSIX_CALL(strtoull)(v, should_be_end, STRTOULL_BASE).failureReturnValue(ULLONG_MAX).evaluate();

    This would allow us, when invoking IOX_POSIX_CALL, to pass a specific end position rather than nullptr as end_ptr, thus enabling us to check for validity.

@Dennis40816
Copy link
Contributor

Oh, and be warned, this can be a sisyphean task to get everything right :)

I hope one day we can completely fix this :(

@elBoberido
Copy link
Member Author

  1. Agreed. It's an implementation detail
  2. Yes, that's the only way to make the conversion work with edge cases. We still need to decide whether we want to allow a conversion from a string like 42answer\0 or not. For our use case digit only strings like 42\0 are sufficient and I think the current behavior is also to not accept strings which contains non digit characters. This simplifies a few things and if we also want to support 42answer\0 this could be done later on.

I think it would a good idea to add .suppressErrorMessagesForErrnos(EINVAL, ERANGE) to the IOX_POSIX_CALL and only handle error messages in fromString ... or better from_string.

When it comes to the failureReturnValue it becomes more tricky. Please double check but from my understanding a return value of 0 will also indicate a failure if nptr == endptr. ULLONG_MAX also only indicates an error if additionally the errno is set. This is the part which makes it a bit hard to get it right.

We are also free to change the API to our likes. Currently we only support base 10 but it would be quite simple to support base 16 by adding another parameter. This could be done in a follow up with a default value of 10.

The class also predates the introduction of the optional in our codebase and has out parameter. Nowadays we would just return an optional. One could then use it like auto answer = iox::convert::from_string<uin8_t>(str).value_or(42);. Alternatively, if we want to give the user some hints on what went wrong with the conversion and expected could be used. Not sure if it is worth the additional effort though.

Furthermore, we introduced iox::into. It would also be an option to use that. The usage would be auto answer = iox::into<iox::optional<int8_t>>(str).value_or(42);. Or one could extend into into try_into and it would be shorter iox::try_into<int8_t>(str).value_or(42);. But this is also something which can be done as second step.

What are your thoughts?

@elBoberido
Copy link
Member Author

Oh, before you start. We decided to move the classes from iceoryx_dust back to iceoryx_hoofs. Let me move at least the convert class back before you start to work on this issue to prevent merge conflicts later on.

@Dennis40816
Copy link
Contributor

I think it would a good idea to add .suppressErrorMessagesForErrnos(EINVAL, ERANGE) to the IOX_POSIX_CALL and only handle >error messages in fromString ... or better from_string.

Yes, I agree we can add .suppressErrorMessagesForErrnos(EINVAL, ERANGE) to IOX_POSIX_CALL in from_string.

When it comes to the failureReturnValue it becomes more tricky. Please double check but from my understanding a return value of 0 will also indicate a failure if nptr == endptr. ULLONG_MAX also only indicates an error if additionally the errno is set. This is the part which makes it a bit hard to get it right.

I believe that you are correct. Similar issue for strtoull was also discussed here.

We are also free to change the API to our likes. Currently we only support base 10 but it would be quite simple to support base 16 by adding another parameter. This could be done in a follow up with a default value of 10.

Sure.

The class also predates the introduction of the optional in our codebase and has out parameter. Nowadays we would just return an optional. One could then use it like auto answer = iox::convert::from_string<uin8_t>(str).value_or(42);. Alternatively, if we want to give the user some hints on what went wrong with the conversion and expected could be used. Not sure if it is worth the additional effort though.

Ok, let's implement optional version first.

Furthermore, we introduced iox::into. It would also be an option to use that. The usage would be auto answer = iox::into<iox::optional<int8_t>>(str).value_or(42);. Or one could extend into into try_into and it would be shorter iox::try_into<int8_t>(str).value_or(42);. But this is also something which can be done as second step.

Not sure it's correct or not but I think iox::try_into can return iox::expected. Therefore, we can combine these two in second step. Btw, I'm not pretty sure what's the relationship between iox::into and iox::convert. In Rust, from and into are traits in std::convert.

@elBoberido
Copy link
Member Author

iox::convert is one of our older classes. It was created after iceoryx left the state of a research project but still at the beginning of the project when we did not have the building blocks we have today and before some of the devs fell in love with Rust. iox::into is quite new and the idea is borrowed from Rust. While C++ has a conversion operator for classes, there is no idiomatic way to convert one enum to another one and this is where iox::into was introduced. Later on we also used it to decouple the iox::string from the std::string and still be able to convert one into the other. Thinking this further, iox::into can become the generic conversion mechanism in iceoryx. We could for example have a generic implementation for iox::into for all classes which implement a conversion operator. But that's something for a separate issue :)

@elBoberido
Copy link
Member Author

Oh, regarding this issue. I think we can get rid of convert::stringIsNumber. It is used only at one place outside of convert and there it is not really needed.

We can just rely on the actual C function to tell us that a conversion failed. This should simplify the code a bit and also make it more performant.

@elBoberido
Copy link
Member Author

Okay, convert is back in iceoryx_hoofs. You can now start whenever you like without having to do additional work :)

@Dennis40816
Copy link
Contributor

Dennis40816 commented Dec 9, 2023

iox::convert is one of our older classes. It was created after iceoryx left the state of a research project but still at the beginning of the project when we did not have the building blocks we have today and before some of the devs fell in love with Rust. iox::into is quite new and the idea is borrowed from Rust. While C++ has a conversion operator for classes, there is no idiomatic way to convert one enum to another one and this is where iox::into was introduced. Later on we also used it to decouple the iox::string from the std::string and still be able to convert one into the other. Thinking this further, iox::into can become the generic conversion mechanism in iceoryx. We could for example have a generic implementation for iox::into for all classes which implement a conversion operator. But that's something for a separate issue :)

Thanks for your knowledge :)

All right. I create a todo list here to remind myself. If anything miss, feel free to leave a comment.

First step

  • Rename fromString to from_string.

  • Add end_ptr and pass it into the conversion function (e.g., strtoull) as second argument.

  • Store errno in a variable cache, maybe called errno_cache. Preventing errno from being changed unexpectedly.

  • Add suppressErrorMessagesForErrnos to IOX_POSIX_CALL in from_string

  • Change the from_string API. Return an iox::optional instead of a bool value. Also, the out parameter(dest) will be deleted.

  • Determine an error happened or not by errno_cache, end_ptr and return value. When an error occurs, the return value of the type iox::optional should be empty. That is

    bool should_be_false = iox::convert::from_string("BadString").has_value();
  • Remove iox::convert::stringIsNumber

Second step

  • Implement iox::into and iox::try_into for string to number conversion with the iox::expect as return type. Hope this can be the stepping stone making iox::into become the generic conversion mechanism in iceoryx. (Maybe another issue?)
  • Expand the function of conversion so that the more complicate string such as 42answer\0 can also be taken as valid conversion.

@Dennis40816
Copy link
Contributor

For the PR, I think I'll be quite busy within following two weeks. I'll try my best to create one.

@elBoberido
Copy link
Member Author

I'm not quite sure what you mean by Store errno in a variable cache, maybe called errno_cache but it is an implementation detail and I think you know how to implement it in a sane way :)

The plan looks good. Go ahead.

For the PR, I think I'll be quite busy within following two weeks. I'll try my best to create one.

Don't worry. Start whenever you have time. There is no pressure in getting this done anytime soon. As long as you have fun contributing to the project we are happy too :)

Dennis40816 added a commit to Dennis40816/iceoryx that referenced this issue Dec 11, 2023
Rename `fromString` to `from_string`. Note that the name of test cases
are not included.

Signed-off-by: Dennis Liu <[email protected]>
Dennis40816 added a commit to Dennis40816/iceoryx that referenced this issue Dec 13, 2023
Dennis40816 added a commit to Dennis40816/iceoryx that referenced this issue Dec 13, 2023
Dennis40816 added a commit to Dennis40816/iceoryx that referenced this issue Dec 13, 2023
Dennis40816 added a commit to Dennis40816/iceoryx that referenced this issue Dec 13, 2023
Remove dest for numeric conversion. Now, the part of `for_string` will
return iox::optional.

Also, we introduce `validate_return_value` and `check_edge_case` to
simplify the code of checking. The former can be used to return an
iox::optional object while the latter is used in the former to check
whether any conversion failure happened.

Signed-off-by: Dennis Liu <[email protected]>
Dennis40816 added a commit to Dennis40816/iceoryx that referenced this issue Jan 8, 2024
Dennis40816 added a commit to Dennis40816/iceoryx that referenced this issue Jan 8, 2024
Dennis40816 added a commit to Dennis40816/iceoryx that referenced this issue Jan 11, 2024
Dennis40816 added a commit to Dennis40816/iceoryx that referenced this issue Jan 11, 2024
Dennis40816 added a commit to Dennis40816/iceoryx that referenced this issue Jan 11, 2024
Dennis40816 added a commit to Dennis40816/iceoryx that referenced this issue Jan 11, 2024
elBoberido pushed a commit to elBoberido/iceoryx that referenced this issue Jan 11, 2024
elBoberido pushed a commit to elBoberido/iceoryx that referenced this issue Jan 11, 2024
elBoberido pushed a commit to elBoberido/iceoryx that referenced this issue Jan 11, 2024
elBoberido pushed a commit to elBoberido/iceoryx that referenced this issue Jan 11, 2024
elBoberido pushed a commit to elBoberido/iceoryx that referenced this issue Jan 11, 2024
elBoberido pushed a commit to elBoberido/iceoryx that referenced this issue Jan 11, 2024
Dennis40816 added a commit to Dennis40816/iceoryx that referenced this issue Jan 14, 2024
Dennis40816 added a commit to Dennis40816/iceoryx that referenced this issue Jan 14, 2024
Dennis40816 added a commit to Dennis40816/iceoryx that referenced this issue Jan 14, 2024
Dennis40816 added a commit to Dennis40816/iceoryx that referenced this issue Jan 14, 2024
Dennis40816 added a commit to Dennis40816/iceoryx that referenced this issue Jan 14, 2024
Dennis40816 added a commit to Dennis40816/iceoryx that referenced this issue Jan 14, 2024
Dennis40816 added a commit to Dennis40816/iceoryx that referenced this issue Jan 14, 2024
Dennis40816 added a commit to Dennis40816/iceoryx that referenced this issue Jan 14, 2024
Dennis40816 added a commit to Dennis40816/iceoryx that referenced this issue Jan 14, 2024
Dennis40816 added a commit to Dennis40816/iceoryx that referenced this issue Jan 14, 2024
Dennis40816 added a commit to Dennis40816/iceoryx that referenced this issue Jan 14, 2024
Dennis40816 added a commit to Dennis40816/iceoryx that referenced this issue Jan 14, 2024
Dennis40816 added a commit to Dennis40816/iceoryx that referenced this issue Jan 14, 2024
- only MSVC will treat subnormal as valid input.

Signed-off-by: Dennis Liu <[email protected]>
Dennis40816 added a commit to Dennis40816/iceoryx that referenced this issue Jan 15, 2024
Dennis40816 added a commit to Dennis40816/iceoryx that referenced this issue Jan 15, 2024
Dennis40816 added a commit to Dennis40816/iceoryx that referenced this issue Jan 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers refactoring Refactor code without adding features
Projects
None yet
2 participants