[TG-2138] Remove default axioms for strings in string solver #1873

romainbrenguier · 2018-02-22T10:41:26Z

The solver is currently adding default axioms on all the strings based on string-max-length.
This can be a problem if for instance an Object can be cast to String, because the axiom on the string also apply to the original object, independently of whether the branch on which it is cast is reached or not.
This PR removes this axioms.
We still use the string-max-length parameters to avoid the solver getting out of memory by analysing strings that are too big:

access to strings at index greater than string-max-length are replaced by a default character
if the valuation of a string in the model that is found is too long, characters are given a default value
if it is not given string-max-input-length will be given the string-max-length as default, and if string-max-length is not given, it is 2^30 by default

tautschnig

I'm not entirely sure all of the commits relate to the subject of the PR?

Wishlist item: please spell-check your commit messages. It's unfortunate that GitHub doesn't permit commenting on commit messages, else I'd provide more detailed comments on those.

tautschnig · 2018-02-23T10:33:43Z

src/solvers/refinement/string_refinement.cpp

+    stream << "(sr::get_array) returning a dummy string" << eom;
+    return array_of_exprt(
+      from_integer(CHARACTER_FOR_UNKNOWN, char_type),
+      array_typet(char_type, from_integer(n, index_type)));


Commit "Return a dummy string when string is too long " - is there a test for this?

Do you mean a unit test? Because it would be difficult to test with end-to-end tests.

My reading of this (but I could be wrong) is that the behaviour changes with the above patch. If so, it would be nice to have a test that covers this change. If there already is one, then it seems surprising that none of the tests needed to be changed. If there is none, create one - whether a unit test or a regression test is up to you, I don't have sufficient insight.

Actually I'm dropping this commit. The problem it was trying to avoid is that sometimes get would run out of memory trying to concretize some long strings, so get is a more appropriate place to address that (see the commit I added 8d9d9a5).
This issue was occurring in one of tests (jbmc-strings/ValueOf02) because without the default axioms, non deterministic strings that are not used in the program can be given very big length.

tautschnig · 2018-02-23T10:34:13Z

unit/solvers/refinement/string_constraint_instantiation/instantiate_not_contains.cpp

@@ -188,8 +188,7 @@ SCENARIO("instantiate_not_contains",
    // Generating the corresponding axioms and simplifying, recording info
    symbol_tablet symtab;
    const namespacet empty_ns(symtab);
-    string_constraint_generatort::infot info;
-    string_constraint_generatort generator(info, ns);
+    string_constraint_generatort generator(ns);


If this is a change necessitated by one of the earlier commits, please merge it. But it might also just be cleanup, don't really know.

tautschnig · 2018-02-23T10:36:40Z

src/solvers/refinement/string_refinement.cpp

@@ -1793,7 +1795,7 @@ static std::pair<bool, std::vector<exprt>> check_axioms(
    const exprt with_concrete_arrays =
      substitute_array_access(negaxiom, gen_symbol, true);

-    stream << indent << i << ".\n";
+    stream << indent << i << '.' << eom;


Just observing (but not fully understanding) that you use messaget::eom above while here you seem to be getting away with just eom. Maybe that's the case above as well?

In this function we defined const auto eom=messaget::eom;, that's why we can use it here like that.

peterschrammel · 2018-02-24T11:13:18Z

src/solvers/refinement/string_refinement.cpp

@@ -2428,13 +2431,13 @@ exprt string_refinementt::get(const exprt &expr) const
    array_string_exprt &arr = to_array_string_expr(ecopy);
    arr.length() = generator.get_length_of_string_array(arr);
    const auto arr_model_opt =
-      get_array(super_get, ns, generator.max_string_length, debug(), arr);
+      get_array(super_get, ns, max_string_length, debug(), arr);


I'm confused about the commit message. It seems to be still used here.

max_string_length is still used in the solver but not in the constraint generator.

peterschrammel · 2018-02-24T11:18:03Z

src/solvers/refinement/string_refinement.cpp

@@ -119,15 +119,16 @@ static std::vector<exprt> instantiate(
 static optionalt<exprt> get_array(
  const std::function<exprt(const exprt &)> &super_get,
  const namespacet &ns,
-  const std::size_t max_string_length,
+  std::size_t max_string_length,


Unclear why some parameters are const and others aren't.

If the parameter is not a reference, const only has an effect at the function definition and not at declaration.

peterschrammel · 2018-02-24T11:21:41Z

src/solvers/refinement/string_refinement.cpp

@@ -689,6 +689,42 @@ void output_equations(
           << " == " << from_expr(ns, "", equations[i].rhs()) << std::endl;
 }

+/// Make character array accesses at index greater than max_index return


What are the consequences of this? Do you detect at a later stage that such an access has happened?

This is like constraining the arrays to have all the same character after some index, so if the formula we constructed (with this replaced array accesses) is satisfiable then there is a model for the original formula. Note that the model may be a bit different, for instance the underlying solver can come up with "abcde" but the correct model for the string solver is actually "ab???" (string-max-length = 2).

Would it be safer to introduce an assumption that the length is within bounds instead?

this is what I'm trying to avoid

peterschrammel · 2018-02-24T11:22:39Z

regression/strings-smoke-tests/max_input_length/MemberTest.desc

@@ -1,8 +1,8 @@
 CORE
 MemberTest.class
--refine-strings --verbosity 10 --string-max-length 29 --java-assume-inputs-non-null --function MemberTest.main


--function is required if main() doesn't have String[] args

peterschrammel · 2018-02-24T11:25:23Z

src/solvers/refinement/string_refinement.h

@@ -29,6 +29,7 @@ Author: Alberto Griggio, [email protected]
 #include <solvers/refinement/string_refinement_invariant.h>

 #define DEFAULT_MAX_NB_REFINEMENT std::numeric_limits<size_t>::max()
+#define DEFAULT_MAX_STRING_LENGTH (1 << 30)


add a comment that explains that value (it's good to have it in the commit message, but even better to have it in the code)

peterschrammel · 2018-02-24T11:28:18Z

src/solvers/refinement/string_refinement.cpp

-        if(const auto n = numeric_cast<std::size_t>(length))
+        const exprt length = super_get(arr.length());
+        const auto n = numeric_cast<std::size_t>(length);
+        if(n.has_value() && *n <= max_string_length)


Is it correct to make this fail silently?

smowton · 2018-02-26T09:58:54Z

src/cbmc/cbmc_solvers.cpp

@@ -176,8 +176,10 @@ std::unique_ptr<cbmc_solverst::solvert> cbmc_solverst::get_string_refinement()
  info.prop=prop.get();
  info.refinement_bound=DEFAULT_MAX_NB_REFINEMENT;
  info.ui=ui;
-  if(options.get_bool_option("string-max-length"))
-    info.string_max_length=options.get_signed_int_option("string-max-length");
+  info.max_string_length =


Mention these defaults in the help text

smowton · 2018-02-26T10:00:42Z

src/solvers/refinement/string_refinement.cpp

@@ -689,6 +689,42 @@ void output_equations(
           << " == " << from_expr(ns, "", equations[i].rhs()) << std::endl;
 }

+/// Make character array accesses at index greater than max_index return


Would it be safer to introduce an assumption that the length is within bounds instead?

LAJW

Nothing out of the ordinary, though commit messages could be constructed with more care.

These default axioms where too strict and where applied even for strings that may never be creted in the actual program. This could for instance lead to problematic contradiction for a Java object which could be casted either to a string or an int: the axiom added there would apply both on the string and on the integer.

In constraint generator, this was used for adding default axioms but is no longer used.

Having input string longer than string-max-length does not make sense as the solver will not know how to analyse them. So when string-max-input-length is not specified we can use string-max-length instead.

We flush after every output on the output stream, so that if an instruction abort the program we can see what axiom was being processed.

This avoid substituing indexes that are past the length limit up to which we want to refine string.

We can use this to minimize the size of generated formulas when the array_exprt as repetitive values.

When check axioms ask for counter examples, we don't want counter examples that exceeds string-max-length, since these will always be intrepreted as a non-deterministic character.

This is so that any access past the bound of string_max_length is interpreted as a constant. This avoids the underlying solver attempting to assing values to very big indexes in arrays.

We set the value of 2^30 as a default value for string-max-length. It is largely enough for most programs and having a default avoids possible problems with overflows, and running out of memory because of a string that is too large.

Remove test from max_input_length This remove test which was using the way max-input-length was limiting the size of strings, even constant one. The solver is now more permisives and the long strings would just get troncated instead of leading to a solver contradiction. For MemberTest, we add a new test. We now have two tests with different values of the option, which should give different results.

This adds an int greater than string-max-length in a generic array. This is to check that TG-2138 is fixed. Before the issue was solved setting the array to 101 was in contradiction with default axioms added by the string solver.

We should now use string-max-input-length to limit the sizes of strings in the program. Having string-max-length set to something too small can lead to wrong results.

These tests show how the result of the solver can depend on the string-max-input-length and string-max-length parameters.

We use this counter examples in the string solver but the formula given there don't use arrays so it is enough to use boolbvt.

The function could potentially run out of memory when attempting to concretize long strings. Instead of trying to concretize strings for which get_array failed, we return an array_of_exprt.

This is to avoid running out of memory while trying to concretize a very long string.

This ensures we never pass formulas to the underlying solver that would require some constraints on arrays at indexes that are negative or greater than string-max-length.

This is already done in add_lemma.

This was not checking that the instantiate variable does not exceed the upper bound.

This was not adding the correct constraints on the bounds.

romainbrenguier · 2018-04-06T08:05:31Z

This is on hold at the moment, we should first get #2009 merged then I will see what remains to be done on this (several commits will probably go away).

romainbrenguier · 2018-04-13T06:44:57Z

This is replaced by #2052 which includes a subset of the commit here (the others either have been merged or are now unnecessary)

romainbrenguier requested review from cesaro, kroening, martin-cs, mgudemann, peterschrammel, smowton, tautschnig and thk123 as code owners February 22, 2018 10:41

romainbrenguier requested review from LAJW and allredj February 22, 2018 10:41

romainbrenguier added do not merge String solver and removed do not merge labels Feb 22, 2018

romainbrenguier force-pushed the bugfix/get-rid-of-default-axioms#TG-2138 branch from acf7b10 to a69c472 Compare February 22, 2018 12:44

romainbrenguier changed the title ~~Remove default axioms for strings in string solver~~ [TG-2138] Remove default axioms for strings in string solver Feb 22, 2018

romainbrenguier force-pushed the bugfix/get-rid-of-default-axioms#TG-2138 branch 4 times, most recently from f7494f9 to 26ca5b1 Compare February 23, 2018 10:04

tautschnig reviewed Feb 23, 2018

View reviewed changes

romainbrenguier force-pushed the bugfix/get-rid-of-default-axioms#TG-2138 branch 3 times, most recently from 9ae8079 to 8d9d9a5 Compare February 23, 2018 16:15

peterschrammel reviewed Feb 24, 2018

View reviewed changes

romainbrenguier force-pushed the bugfix/get-rid-of-default-axioms#TG-2138 branch from 8d9d9a5 to c8f9348 Compare February 26, 2018 07:50

smowton reviewed Feb 26, 2018

View reviewed changes

LAJW approved these changes Feb 26, 2018

View reviewed changes

romainbrenguier added 3 commits March 1, 2018 08:34

Get rid of string_max_length field

163250a

In constraint generator, this was used for adding default axioms but is no longer used.

Use string-max-length as default max input-length

c303197

Having input string longer than string-max-length does not make sense as the solver will not know how to analyse them. So when string-max-input-length is not specified we can use string-max-length instead.

romainbrenguier added 19 commits March 1, 2018 08:34

Give indications in case of error in solver

b233d44

Improve debugging in string solver

fcb3e4d

We flush after every output on the output stream, so that if an instruction abort the program we can see what axiom was being processed.

Add max_index to substitute array function params

b817de5

This avoid substituing indexes that are past the length limit up to which we want to refine string.

Initialize interval_sparse_array from array_exprt

57140f1

We can use this to minimize the size of generated formulas when the array_exprt as repetitive values.

Ensure counter-example is smaller than max-length

2323da2

When check axioms ask for counter examples, we don't want counter examples that exceeds string-max-length, since these will always be intrepreted as a non-deterministic character.

Add guard to array access in instanciated formulas

715837b

This is so that any access past the bound of string_max_length is interpreted as a constant. This avoids the underlying solver attempting to assing values to very big indexes in arrays.

Small refactoring in string_refinement instantiate

4299659

Define a sensible default for string-max-length

93a3f20

We set the value of 2^30 as a default value for string-max-length. It is largely enough for most programs and having a default avoids possible problems with overflows, and running out of memory because of a string that is too large.

Add test for generics array and strings

fccdff1

This adds an int greater than string-max-length in a generic array. This is to check that TG-2138 is fixed. Before the issue was solved setting the array to 101 was in contradiction with default axioms added by the string solver.

Use string-max-input-length in tests

0ea2af6

We should now use string-max-input-length to limit the sizes of strings in the program. Having string-max-length set to something too small can lead to wrong results.

Add tests showing the effect on string-max-length

54fc9cc

These tests show how the result of the solver can depend on the string-max-input-length and string-max-length parameters.

Use boolbvt for getting counter examples

c4b2434

We use this counter examples in the string solver but the formula given there don't use arrays so it is enough to use boolbvt.

Ensure get does not concretize long strings

79c218a

The function could potentially run out of memory when attempting to concretize long strings. Instead of trying to concretize strings for which get_array failed, we return an array_of_exprt.

Do not concretize past string_max_length

6a0e9fe

This is to avoid running out of memory while trying to concretize a very long string.

Make add_lemma add guard to array accesses

d0412d5

This ensures we never pass formulas to the underlying solver that would require some constraints on arrays at indexes that are negative or greater than string-max-length.

Remove unecessary replace_expr

d845a0a

This is already done in add_lemma.

Correct bounds in instantiate method

569edb5

This was not checking that the instantiate variable does not exceed the upper bound.

Correct instantiation of counter example

2a1bc92

This was not adding the correct constraints on the bounds.

romainbrenguier force-pushed the bugfix/get-rid-of-default-axioms#TG-2138 branch from c8f9348 to 2a1bc92 Compare March 1, 2018 17:30

romainbrenguier added the In progress label Mar 6, 2018

tautschnig assigned romainbrenguier Apr 5, 2018

romainbrenguier mentioned this pull request Apr 13, 2018

[TG-2138] Stop adding default axioms in string solver #2052

Merged

romainbrenguier closed this Apr 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TG-2138] Remove default axioms for strings in string solver #1873

[TG-2138] Remove default axioms for strings in string solver #1873

romainbrenguier commented Feb 22, 2018 •

edited

Loading

tautschnig left a comment

tautschnig Feb 23, 2018

romainbrenguier Feb 23, 2018

tautschnig Feb 23, 2018

romainbrenguier Feb 23, 2018

tautschnig Feb 23, 2018

romainbrenguier Feb 23, 2018

tautschnig Feb 23, 2018

romainbrenguier Feb 23, 2018

peterschrammel Feb 24, 2018

romainbrenguier Feb 26, 2018

peterschrammel Feb 24, 2018

romainbrenguier Feb 26, 2018

peterschrammel Feb 24, 2018

romainbrenguier Feb 26, 2018

smowton Feb 26, 2018

romainbrenguier Feb 26, 2018

peterschrammel Feb 24, 2018

peterschrammel Feb 24, 2018

peterschrammel Feb 24, 2018

smowton Feb 26, 2018

smowton Feb 26, 2018

LAJW left a comment

romainbrenguier commented Apr 6, 2018

romainbrenguier commented Apr 13, 2018

[TG-2138] Remove default axioms for strings in string solver #1873

[TG-2138] Remove default axioms for strings in string solver #1873

Conversation

romainbrenguier commented Feb 22, 2018 • edited Loading

tautschnig left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LAJW left a comment

Choose a reason for hiding this comment

romainbrenguier commented Apr 6, 2018

romainbrenguier commented Apr 13, 2018

romainbrenguier commented Feb 22, 2018 •

edited

Loading