-
Notifications
You must be signed in to change notification settings - Fork 273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TG-2138] Remove default axioms for strings in string solver #1873
[TG-2138] Remove default axioms for strings in string solver #1873
Conversation
acf7b10
to
a69c472
Compare
f7494f9
to
26ca5b1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure all of the commits relate to the subject of the PR?
Wishlist item: please spell-check your commit messages. It's unfortunate that GitHub doesn't permit commenting on commit messages, else I'd provide more detailed comments on those.
stream << "(sr::get_array) returning a dummy string" << eom; | ||
return array_of_exprt( | ||
from_integer(CHARACTER_FOR_UNKNOWN, char_type), | ||
array_typet(char_type, from_integer(n, index_type))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit "Return a dummy string when string is too long " - is there a test for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean a unit test? Because it would be difficult to test with end-to-end tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My reading of this (but I could be wrong) is that the behaviour changes with the above patch. If so, it would be nice to have a test that covers this change. If there already is one, then it seems surprising that none of the tests needed to be changed. If there is none, create one - whether a unit test or a regression test is up to you, I don't have sufficient insight.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I'm dropping this commit. The problem it was trying to avoid is that sometimes get
would run out of memory trying to concretize some long strings, so get
is a more appropriate place to address that (see the commit I added 8d9d9a5).
This issue was occurring in one of tests (jbmc-strings/ValueOf02
) because without the default axioms, non deterministic strings that are not used in the program can be given very big length.
@@ -188,8 +188,7 @@ SCENARIO("instantiate_not_contains", | |||
// Generating the corresponding axioms and simplifying, recording info | |||
symbol_tablet symtab; | |||
const namespacet empty_ns(symtab); | |||
string_constraint_generatort::infot info; | |||
string_constraint_generatort generator(info, ns); | |||
string_constraint_generatort generator(ns); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is a change necessitated by one of the earlier commits, please merge it. But it might also just be cleanup, don't really know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
squashed
@@ -1793,7 +1795,7 @@ static std::pair<bool, std::vector<exprt>> check_axioms( | |||
const exprt with_concrete_arrays = | |||
substitute_array_access(negaxiom, gen_symbol, true); | |||
|
|||
stream << indent << i << ".\n"; | |||
stream << indent << i << '.' << eom; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just observing (but not fully understanding) that you use messaget::eom
above while here you seem to be getting away with just eom
. Maybe that's the case above as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this function we defined const auto eom=messaget::eom;
, that's why we can use it here like that.
9ae8079
to
8d9d9a5
Compare
@@ -2428,13 +2431,13 @@ exprt string_refinementt::get(const exprt &expr) const | |||
array_string_exprt &arr = to_array_string_expr(ecopy); | |||
arr.length() = generator.get_length_of_string_array(arr); | |||
const auto arr_model_opt = | |||
get_array(super_get, ns, generator.max_string_length, debug(), arr); | |||
get_array(super_get, ns, max_string_length, debug(), arr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused about the commit message. It seems to be still used here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
max_string_length is still used in the solver but not in the constraint generator.
@@ -119,15 +119,16 @@ static std::vector<exprt> instantiate( | |||
static optionalt<exprt> get_array( | |||
const std::function<exprt(const exprt &)> &super_get, | |||
const namespacet &ns, | |||
const std::size_t max_string_length, | |||
std::size_t max_string_length, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unclear why some parameters are const and others aren't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the parameter is not a reference, const
only has an effect at the function definition and not at declaration.
@@ -689,6 +689,42 @@ void output_equations( | |||
<< " == " << from_expr(ns, "", equations[i].rhs()) << std::endl; | |||
} | |||
|
|||
/// Make character array accesses at index greater than max_index return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the consequences of this? Do you detect at a later stage that such an access has happened?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is like constraining the arrays to have all the same character after some index, so if the formula we constructed (with this replaced array accesses) is satisfiable then there is a model for the original formula. Note that the model may be a bit different, for instance the underlying solver can come up with "abcde" but the correct model for the string solver is actually "ab???" (string-max-length = 2).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be safer to introduce an assumption that the length is within bounds instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is what I'm trying to avoid
@@ -1,8 +1,8 @@ | |||
CORE | |||
MemberTest.class | |||
--refine-strings --verbosity 10 --string-max-length 29 --java-assume-inputs-non-null --function MemberTest.main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--function is required if main() doesn't have String[] args
@@ -29,6 +29,7 @@ Author: Alberto Griggio, [email protected] | |||
#include <solvers/refinement/string_refinement_invariant.h> | |||
|
|||
#define DEFAULT_MAX_NB_REFINEMENT std::numeric_limits<size_t>::max() | |||
#define DEFAULT_MAX_STRING_LENGTH (1 << 30) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a comment that explains that value (it's good to have it in the commit message, but even better to have it in the code)
if(const auto n = numeric_cast<std::size_t>(length)) | ||
const exprt length = super_get(arr.length()); | ||
const auto n = numeric_cast<std::size_t>(length); | ||
if(n.has_value() && *n <= max_string_length) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it correct to make this fail silently?
8d9d9a5
to
c8f9348
Compare
@@ -176,8 +176,10 @@ std::unique_ptr<cbmc_solverst::solvert> cbmc_solverst::get_string_refinement() | |||
info.prop=prop.get(); | |||
info.refinement_bound=DEFAULT_MAX_NB_REFINEMENT; | |||
info.ui=ui; | |||
if(options.get_bool_option("string-max-length")) | |||
info.string_max_length=options.get_signed_int_option("string-max-length"); | |||
info.max_string_length = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mention these defaults in the help text
@@ -689,6 +689,42 @@ void output_equations( | |||
<< " == " << from_expr(ns, "", equations[i].rhs()) << std::endl; | |||
} | |||
|
|||
/// Make character array accesses at index greater than max_index return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be safer to introduce an assumption that the length is within bounds instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing out of the ordinary, though commit messages could be constructed with more care.
These default axioms where too strict and where applied even for strings that may never be creted in the actual program. This could for instance lead to problematic contradiction for a Java object which could be casted either to a string or an int: the axiom added there would apply both on the string and on the integer.
In constraint generator, this was used for adding default axioms but is no longer used.
Having input string longer than string-max-length does not make sense as the solver will not know how to analyse them. So when string-max-input-length is not specified we can use string-max-length instead.
We flush after every output on the output stream, so that if an instruction abort the program we can see what axiom was being processed.
This avoid substituing indexes that are past the length limit up to which we want to refine string.
We can use this to minimize the size of generated formulas when the array_exprt as repetitive values.
When check axioms ask for counter examples, we don't want counter examples that exceeds string-max-length, since these will always be intrepreted as a non-deterministic character.
This is so that any access past the bound of string_max_length is interpreted as a constant. This avoids the underlying solver attempting to assing values to very big indexes in arrays.
We set the value of 2^30 as a default value for string-max-length. It is largely enough for most programs and having a default avoids possible problems with overflows, and running out of memory because of a string that is too large.
Remove test from max_input_length This remove test which was using the way max-input-length was limiting the size of strings, even constant one. The solver is now more permisives and the long strings would just get troncated instead of leading to a solver contradiction. For MemberTest, we add a new test. We now have two tests with different values of the option, which should give different results.
This adds an int greater than string-max-length in a generic array. This is to check that TG-2138 is fixed. Before the issue was solved setting the array to 101 was in contradiction with default axioms added by the string solver.
We should now use string-max-input-length to limit the sizes of strings in the program. Having string-max-length set to something too small can lead to wrong results.
These tests show how the result of the solver can depend on the string-max-input-length and string-max-length parameters.
We use this counter examples in the string solver but the formula given there don't use arrays so it is enough to use boolbvt.
The function could potentially run out of memory when attempting to concretize long strings. Instead of trying to concretize strings for which get_array failed, we return an array_of_exprt.
This is to avoid running out of memory while trying to concretize a very long string.
This ensures we never pass formulas to the underlying solver that would require some constraints on arrays at indexes that are negative or greater than string-max-length.
This is already done in add_lemma.
This was not checking that the instantiate variable does not exceed the upper bound.
This was not adding the correct constraints on the bounds.
c8f9348
to
2a1bc92
Compare
This is on hold at the moment, we should first get #2009 merged then I will see what remains to be done on this (several commits will probably go away). |
This is replaced by #2052 which includes a subset of the commit here (the others either have been merged or are now unnecessary) |
The solver is currently adding default axioms on all the strings based on string-max-length.
This can be a problem if for instance an Object can be cast to String, because the axiom on the string also apply to the original object, independently of whether the branch on which it is cast is reached or not.
This PR removes this axioms.
We still use the string-max-length parameters to avoid the solver getting out of memory by analysing strings that are too big: