Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TG-2138] Remove default axioms for strings in string solver #1873

Conversation

romainbrenguier
Copy link
Contributor

@romainbrenguier romainbrenguier commented Feb 22, 2018

The solver is currently adding default axioms on all the strings based on string-max-length.
This can be a problem if for instance an Object can be cast to String, because the axiom on the string also apply to the original object, independently of whether the branch on which it is cast is reached or not.
This PR removes this axioms.
We still use the string-max-length parameters to avoid the solver getting out of memory by analysing strings that are too big:

  • access to strings at index greater than string-max-length are replaced by a default character
  • if the valuation of a string in the model that is found is too long, characters are given a default value
  • if it is not given string-max-input-length will be given the string-max-length as default, and if string-max-length is not given, it is 2^30 by default

@romainbrenguier romainbrenguier force-pushed the bugfix/get-rid-of-default-axioms#TG-2138 branch from acf7b10 to a69c472 Compare February 22, 2018 12:44
@romainbrenguier romainbrenguier changed the title Remove default axioms for strings in string solver [TG-2138] Remove default axioms for strings in string solver Feb 22, 2018
@romainbrenguier romainbrenguier force-pushed the bugfix/get-rid-of-default-axioms#TG-2138 branch 4 times, most recently from f7494f9 to 26ca5b1 Compare February 23, 2018 10:04
Copy link
Collaborator

@tautschnig tautschnig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure all of the commits relate to the subject of the PR?

Wishlist item: please spell-check your commit messages. It's unfortunate that GitHub doesn't permit commenting on commit messages, else I'd provide more detailed comments on those.

stream << "(sr::get_array) returning a dummy string" << eom;
return array_of_exprt(
from_integer(CHARACTER_FOR_UNKNOWN, char_type),
array_typet(char_type, from_integer(n, index_type)));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit "Return a dummy string when string is too long " - is there a test for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean a unit test? Because it would be difficult to test with end-to-end tests.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reading of this (but I could be wrong) is that the behaviour changes with the above patch. If so, it would be nice to have a test that covers this change. If there already is one, then it seems surprising that none of the tests needed to be changed. If there is none, create one - whether a unit test or a regression test is up to you, I don't have sufficient insight.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I'm dropping this commit. The problem it was trying to avoid is that sometimes get would run out of memory trying to concretize some long strings, so get is a more appropriate place to address that (see the commit I added 8d9d9a5).
This issue was occurring in one of tests (jbmc-strings/ValueOf02) because without the default axioms, non deterministic strings that are not used in the program can be given very big length.

@@ -188,8 +188,7 @@ SCENARIO("instantiate_not_contains",
// Generating the corresponding axioms and simplifying, recording info
symbol_tablet symtab;
const namespacet empty_ns(symtab);
string_constraint_generatort::infot info;
string_constraint_generatort generator(info, ns);
string_constraint_generatort generator(ns);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is a change necessitated by one of the earlier commits, please merge it. But it might also just be cleanup, don't really know.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

squashed

@@ -1793,7 +1795,7 @@ static std::pair<bool, std::vector<exprt>> check_axioms(
const exprt with_concrete_arrays =
substitute_array_access(negaxiom, gen_symbol, true);

stream << indent << i << ".\n";
stream << indent << i << '.' << eom;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just observing (but not fully understanding) that you use messaget::eom above while here you seem to be getting away with just eom. Maybe that's the case above as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this function we defined const auto eom=messaget::eom;, that's why we can use it here like that.

@romainbrenguier romainbrenguier force-pushed the bugfix/get-rid-of-default-axioms#TG-2138 branch 3 times, most recently from 9ae8079 to 8d9d9a5 Compare February 23, 2018 16:15
@@ -2428,13 +2431,13 @@ exprt string_refinementt::get(const exprt &expr) const
array_string_exprt &arr = to_array_string_expr(ecopy);
arr.length() = generator.get_length_of_string_array(arr);
const auto arr_model_opt =
get_array(super_get, ns, generator.max_string_length, debug(), arr);
get_array(super_get, ns, max_string_length, debug(), arr);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused about the commit message. It seems to be still used here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max_string_length is still used in the solver but not in the constraint generator.

@@ -119,15 +119,16 @@ static std::vector<exprt> instantiate(
static optionalt<exprt> get_array(
const std::function<exprt(const exprt &)> &super_get,
const namespacet &ns,
const std::size_t max_string_length,
std::size_t max_string_length,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unclear why some parameters are const and others aren't.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the parameter is not a reference, const only has an effect at the function definition and not at declaration.

@@ -689,6 +689,42 @@ void output_equations(
<< " == " << from_expr(ns, "", equations[i].rhs()) << std::endl;
}

/// Make character array accesses at index greater than max_index return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the consequences of this? Do you detect at a later stage that such an access has happened?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is like constraining the arrays to have all the same character after some index, so if the formula we constructed (with this replaced array accesses) is satisfiable then there is a model for the original formula. Note that the model may be a bit different, for instance the underlying solver can come up with "abcde" but the correct model for the string solver is actually "ab???" (string-max-length = 2).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be safer to introduce an assumption that the length is within bounds instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is what I'm trying to avoid

@@ -1,8 +1,8 @@
CORE
MemberTest.class
--refine-strings --verbosity 10 --string-max-length 29 --java-assume-inputs-non-null --function MemberTest.main
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--function is required if main() doesn't have String[] args

@@ -29,6 +29,7 @@ Author: Alberto Griggio, [email protected]
#include <solvers/refinement/string_refinement_invariant.h>

#define DEFAULT_MAX_NB_REFINEMENT std::numeric_limits<size_t>::max()
#define DEFAULT_MAX_STRING_LENGTH (1 << 30)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment that explains that value (it's good to have it in the commit message, but even better to have it in the code)

if(const auto n = numeric_cast<std::size_t>(length))
const exprt length = super_get(arr.length());
const auto n = numeric_cast<std::size_t>(length);
if(n.has_value() && *n <= max_string_length)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it correct to make this fail silently?

@romainbrenguier romainbrenguier force-pushed the bugfix/get-rid-of-default-axioms#TG-2138 branch from 8d9d9a5 to c8f9348 Compare February 26, 2018 07:50
@@ -176,8 +176,10 @@ std::unique_ptr<cbmc_solverst::solvert> cbmc_solverst::get_string_refinement()
info.prop=prop.get();
info.refinement_bound=DEFAULT_MAX_NB_REFINEMENT;
info.ui=ui;
if(options.get_bool_option("string-max-length"))
info.string_max_length=options.get_signed_int_option("string-max-length");
info.max_string_length =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mention these defaults in the help text

@@ -689,6 +689,42 @@ void output_equations(
<< " == " << from_expr(ns, "", equations[i].rhs()) << std::endl;
}

/// Make character array accesses at index greater than max_index return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be safer to introduce an assumption that the length is within bounds instead?

Copy link
Contributor

@LAJW LAJW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing out of the ordinary, though commit messages could be constructed with more care.

These default axioms where too strict and where applied even for strings
that may never be creted in the actual program.
This could for instance lead to problematic contradiction for a Java
object which could be casted either to a string or an int:
the axiom added there would apply both on the string and on the integer.
In constraint generator, this was used for adding default axioms but is
no longer used.
Having input string longer than string-max-length does not make sense as
the solver will not know how to analyse them.
So when string-max-input-length is not specified we can use
string-max-length instead.
We flush after every output on the output stream, so that if an
instruction abort the program we can see what axiom was being processed.
This avoid substituing indexes that are past the length limit up to
which we want to refine string.
We can use this to minimize the size of generated formulas when the
array_exprt as repetitive values.
When check axioms ask for counter examples, we don't want counter
examples that exceeds string-max-length, since these will always be
intrepreted as a non-deterministic character.
This is so that any access past the bound of string_max_length is
interpreted as a constant.
This avoids the underlying solver attempting to assing values to very
big indexes in arrays.
We set the value of 2^30 as a default value for string-max-length.
It is largely enough for most programs and having a default avoids
possible problems with overflows, and running out of memory because of a
string that is too large.
Remove test from max_input_length

This remove test which was using the way max-input-length was limiting
the size of strings, even constant one.
The solver is now more permisives and the long strings would just get
troncated instead of leading to a solver contradiction.

For MemberTest, we add a new test.
We now have two tests with different values of the option, which should
give different results.
This adds an int greater than string-max-length in a generic array.
This is to check that TG-2138 is fixed.
Before the issue was solved setting the array to 101 was in
contradiction with default axioms added by the string solver.
We should now use string-max-input-length to limit the sizes of strings
in the program.
Having string-max-length set to something too small can lead to wrong
results.
These tests show how the result of the solver can depend on the
string-max-input-length and string-max-length parameters.
We use this counter examples in the string solver but the formula given
there don't use arrays so it is enough to use boolbvt.
The function could potentially run out of memory when attempting to
concretize long strings.
Instead of trying to concretize strings for which get_array failed, we
return an array_of_exprt.
This is to avoid running out of memory while trying to concretize a
very long string.
This ensures we never pass formulas to the underlying solver that would
require some constraints on arrays at indexes that are negative or
greater than string-max-length.
This is already done in add_lemma.
This was not checking that the instantiate variable does not exceed the
upper bound.
This was not adding the correct constraints on the bounds.
@romainbrenguier
Copy link
Contributor Author

This is on hold at the moment, we should first get #2009 merged then I will see what remains to be done on this (several commits will probably go away).

@romainbrenguier
Copy link
Contributor Author

This is replaced by #2052 which includes a subset of the commit here (the others either have been merged or are now unnecessary)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants