Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TG-3173] Assign Strings at malloc site in java object factory #2038

Merged

Conversation

romainbrenguier
Copy link
Contributor

This appears to be much more efficient than assigning to the object in the case the
object is an element of an array.

This is much more efficient than assigning to the object in the case the
object is an element of an array.
This checks that jbmc correctly analyze functions that take arrays of
strings as inputs.
With the changes assigning strings directly at malloc site, this example
can be analyzed by jbmc approximately 15 times faster.
@romainbrenguier romainbrenguier force-pushed the bugfix/assign-at-malloc-site branch from aa09605 to c207106 Compare April 11, 2018 07:18
@romainbrenguier romainbrenguier changed the title Assign Strings at malloc site in java object factory TG-3173 [TG-3173] Assign Strings at malloc site in java object factory Apr 11, 2018
@romainbrenguier romainbrenguier requested a review from LAJW April 11, 2018 08:40
@romainbrenguier romainbrenguier merged commit 513b67a into diffblue:develop Apr 11, 2018
@romainbrenguier romainbrenguier deleted the bugfix/assign-at-malloc-site branch April 11, 2018 08:59
@tautschnig
Copy link
Collaborator

My apologies for being (too) late to the party, but I take issue with this PR. Here's why:

  1. The commit message says "much more efficient" - that's weasel words, how much more? The second commit says "15 times", but what are the absolute numbers? Going from 0.15ms to 0.1ms is measurement inaccuracy, going from 15h to 1h is a clearly noticeable difference.
  2. The regression test is not immediately useful as such: this seems to have worked before, so maybe we could regress back into a worse state without noticing? A regression test must actually check for something. From the commit messages I'd guess that one would notice a regression because running the regression test suite would take much longer? How much longer? I use many different systems, some of which are pretty slow, would I notice that regression or would I just blame it on my computer being slow?
  3. I have tried to figure out what the difference in the generated programs is. I may be doing this very wrong due to my poor knowledge of the Java frontend, but here's what I got by running jbmc --show-goto-functions --refine-strings --function Test.check --string-max-input-length 1000 Test.class (please tell me if that isn't telling the full story):
--- before.txt	2018-04-11 10:06:23.289256817 +0000
+++ after.txt	2018-04-11 10:26:05.663043314 +0000
@@ -985,7 +985,7 @@
         // 438 file Test.java line 5 function java::Test.check:([Ljava/lang/String;)Ljava/lang/String;
         return_array$34 = cprover_associate_length_to_array_func(*string_data_pointer$31, tmp_object_factory$30);
         // 439 no location
-        *array_data_init$27[array_init_iter$28] = { [email protected]={ .@class_identifier="java::java.lang.String",
+        *malloc_site$29 = { [email protected]={ .@class_identifier="java::java.lang.String",
     .@lock=false },
     .length=tmp_object_factory$30, .data=*string_data_pointer$31 };
         // 440 no location

Ok, so that seems to fit the code changes and the commit messages: rather than relying on the alias analysis (array_data_init$27[array_init_iter$28] is earlier set to malloc_site$29), which may over-approximate and thus generate assignments to some invalid_object array, just one level of indirection is used. It seems reasonable that this would improve performance. Is that a correct analysis of why this improves performance?
4. Looking at the resulting goto program, array_data_init$27 only ever appears on the left-hand side, and none of these involve dereferencing. Thus (but again I may not have used jbmc correctly) it just seems to be adding overhead and should be removed completely? But then also malloc_site$29 never appears on the right-hand side, so it seems to be equally useless? Is that just because it's such a limited regression test?

@romainbrenguier
Copy link
Contributor Author

@tautschnig

  1. The commit message says "much more efficient" - that's weasel words, how much more? The second commit says "15 times", but what are the absolute numbers? Going from 0.15ms to 0.1ms is measurement inaccuracy, going from 15h to 1h is a clearly noticeable difference.

On my computer this was 7s against 0.4s.

  1. The regression test is not immediately useful as such: this seems to have worked before, so maybe we could regress back into a worse state without noticing? A regression test must actually check for something. From the commit messages I'd guess that one would notice a regression because running the regression test suite would take much longer? How much longer? I use many different systems, some of which are pretty slow, would I notice that regression or would I just blame it on my computer being slow?

Yes, I know this is not ideal, but couldn't find a good solution for checking that performances do not degrade. The test at least checks that we haven't broken things. Do you have suggestions for a better check?

Ok, so that seems to fit the code changes and the commit messages: rather than relying on the alias analysis (array_data_init$27[array_init_iter$28] is earlier set to malloc_site$29), which may over-approximate and thus generate assignments to some invalid_object array, just one level of indirection is used. It seems reasonable that this would improve performance. Is that a correct analysis of why this improves performance?

Yes that's a possible explanation. What I observed is that with the 'before' version some unecessary equations where generated:

 {-96} dynamic_object9#1 == (dynamic_2_array#11[(24l + POINTER_OFFSET(array_data_init$27!0#2)) / 8l] == &dynamic_object9 ? { [email protected]={ .@class_identifier="java::java.lang.String", .@lock=FALSE }, .length=tmp_object_factory$30!0#11,
    .data=dynamic_object10 } : dynamic_object9#0)
{-97} dynamic_object7#2 == (dynamic_2_array#11[(24l + POINTER_OFFSET(array_data_init$27!0#2)) / 8l] == &dynamic_object7 && !(dynamic_2_array#11[(24l + POINTER_OFFSET(array_data_init$27!0#2)) / 8l] == &dynamic_object9) ? { [email protected]={ .@class_identifier="java::java.lang.String", .@lock=FALSE }, .length=tmp_object_factory$30!0#11,
    .data=dynamic_object10 } : dynamic_object7#1)
{-98} dynamic_object5#3 == (dynamic_2_array#11[(24l + POINTER_OFFSET(array_data_init$27!0#2)) / 8l] == &dynamic_object5 && !(dynamic_2_array#11[(24l + POINTER_OFFSET(array_data_init$27!0#2)) / 8l] == &dynamic_object7) && !(dynamic_2_array#11[(24l + POINTER_OFFSET(array_data_init$27!0#2)) / 8l] == &dynamic_object9) ? { [email protected]={ .@class_identifier="java::java.lang.String", .@lock=FALSE }, .length=tmp_object_factory$30!0#11,
    .data=dynamic_object10 } : dynamic_object5#2)
{-99} dynamic_object3#4 == (dynamic_2_array#11[(24l + POINTER_OFFSET(array_data_init$27!0#2)) / 8l] == &dynamic_object3 && !(dynamic_2_array#11[(24l + POINTER_OFFSET(array_data_init$27!0#2)) / 8l] == &dynamic_object5) && !(dynamic_2_array#11[(24l + POINTER_OFFSET(array_data_init$27!0#2)) / 8l] == &dynamic_object7) && !(dynamic_2_array#11[(24l + POINTER_OFFSET(array_data_init$27!0#2)) / 8l] == &dynamic_object9) ? { [email protected]={ .@class_identifier="java::java.lang.String", .@lock=FALSE }, .length=tmp_object_factory$30!0#11,
    .data=dynamic_object10 } : dynamic_object3#3)
{-100} invalid_object3#1 == (!(dynamic_2_array#11[(24l + POINTER_OFFSET(array_data_init$27!0#2)) / 8l] == &dynamic_object3) && !(dynamic_2_array#11[(24l + POINTER_OFFSET(array_data_init$27!0#2)) / 8l] == &dynamic_object5) && !(dynamic_2_array#11[(24l + POINTER_OFFSET(array_data_init$27!0#2)) / 8l] == &dynamic_object7) && !(dynamic_2_array#11[(24l + POINTER_OFFSET(array_data_init$27!0#2)) / 8l] == &dynamic_object9) ? { [email protected]={ .@class_identifier="java::java.lang.String", .@lock=FALSE }, .length=tmp_object_factory$30!0#11,
    .data=dynamic_object10 } : invalid_object3#0)

instead of just

dynamic_object9#1 == { [email protected]={ .@class_identifier="java::java.lang.String", .@lock=FALSE }, .length=tmp_object_factory$30!0#11,
    .data=dynamic_object10 }

which makes the rest of the analysis more complicated

  1. Looking at the resulting goto program, array_data_init$27 only ever appears on the left-hand side, and none of these involve dereferencing. Thus (but again I may not have used jbmc correctly) it just seems to be adding overhead and should be removed completely? But then also malloc_site$29 never appears on the right-hand side, so it seems to be equally useless? Is that just because it's such a limited regression test?

The first assignment of array_data_init$27 is array_data_init$27 = (struct java.lang.String **)arg0a->data; so the modifications we do afterwards affect arg0a which is the input of our function.

@tautschnig
Copy link
Collaborator

Thanks a lot for all the input!

Yes, I know this is not ideal, but couldn't find a good solution for checking that performances do not degrade. The test at least checks that we haven't broken things. Do you have suggestions for a better check?

How about you add another x.desc file that includes --show-goto-functions and then checks for the absence of \*array_data_init\$ (that is, no dereferencing on array_data_init? Alternatively you could do --show-vcc and check for absence of the lines you raised, but I think that's harder to do without ambiguity.

The first assignment of [...]

My apologies for missing that, makes of course sense and does the dereferencing (via indexed access) that I wrongly claimed was not happening.

smowton added a commit to smowton/cbmc that referenced this pull request May 9, 2018
ad62682 Merge pull request diffblue#2071 from thk123/refactor/split-string-unit-tests
fc8ba88 Revert to aborting precondition for function inputs
3e2ab6f Merge pull request diffblue#2080 from diffblue/java-bytecode-dependency
6ff1eec cbmc: remove dependency on java_bytecode
0bff19b Merge pull request diffblue#2049 from karkhaz/kk-factor-goto-model-processing
79e3b25 Merge pull request diffblue#2084 from tautschnig/has_subtype-test
cd45b0b Test has_subtype on recursive data types
85ac315 Merge pull request diffblue#2082 from thomasspriggs/default_dstring_hash
28c2e8b Merge pull request diffblue#2065 from tautschnig/be-constructor
afa6023 Merge pull request diffblue#2061 from tautschnig/simplify-extractbits
014d151 Factor out getting & processing goto-model
06b3adc Merge pull request diffblue#2077 from peterschrammel/stable-tmp-var-names
0b3170d Stabilize clinit wrapper function type parameters
3cd8bf4 Temporary vars tests for goto-diff
9f0626c  Prefix temporary var names with function id
ca678aa More permissive regression tests regarding tmp var suffixes
47951ca Merge pull request diffblue#2079 from romainbrenguier/bugfix/has-subtype-recursion
dd73b1a Specify default hash function of `dstringt` to STL.
fe8e589 Avoid infinite recursion in has_subtype
00b9bf6 Merge pull request diffblue#2051 from svorenova/generics_tg2717
cd4bfc3 Merge pull request diffblue#2078 from romainbrenguier/bool-literal-in-while-loop
67ea889 Use bool literal in while loop
d229ad9 Merge pull request diffblue#2056 from diffblue/fix-regression-cbmc-memcpy1
506faf0 Refactor a function for base existence
617d388 Utility functions for generic types
c07e6ca Update generic specialization map when replacing pointers
ed26d0a Merge pull request diffblue#2058 from peterschrammel/stable-disjuncts
b663734 Simplify extractbits(concatenation(...))
b091560 Typing and refactoring of simplify_extractbits
49ad1bd Merge pull request diffblue#974 from tautschnig/fix-assert-encoding
16e9599 Merge pull request diffblue#2063 from tautschnig/has-subtype
950f58b Merge pull request diffblue#2060 from tautschnig/opt-local-map
4222a94 Regression tests for unstable instanceof and virtual method disjuncts
b44589e Make disjuncts in instanceof removal independent of class loading
3afff86 Make disjuncts in virtual method removal independent of class loading
a385d9b Allowed split_string to be used on whitespace if not also trying to strip
fe4a642 Merge pull request diffblue#2062 from tautschnig/no-has-deref
145f474 Adding tests for empty string edge cases
07009d4 Refactored test to run all combinations
252c24c Migrate old string utils unit tests
e87edbf Removing wrong elements from the make file
b165c52 make test work on 32-bit Linux
b804570 Merge pull request diffblue#2048 from JohnDumbell/improvement/adding_null_object_id
61f14d8 Merge pull request diffblue#1962 from owen-jones-diffblue/owen-jones-diffblue/simplify-replace-java-nondet
fdee7e8 Merge pull request diffblue#2059 from tautschnig/generalise-test
4625cc5 Extend global has_subexpr to take a predicate as has_subtype does
e9ebd59 has_subtype(type, pred, ns) to search for contained type matching pred
1f1f67f Merge pull request diffblue#1889 from hannes-steffenhagen-diffblue/develop-feature_generate_function_bodies
048c188 Add unit test for java_replace_nondet
0fe48c9 Make remove_java_nondet work before remove_returns
bcc4dc4 Use byte_extract_exprt constructor
a1814a3 Get rid of thin (and duplicate) has_dereference wrapper
4122a28 Test to demonstrate assert bug on alpine
d44bfd3 Also simplify assert structure found on alpine linux
c5cde18 Do not generate redundant if statements in assert expansions
4fb0603 Make is_skip publicly available and use constant argument
5832ffd Negative numbers should also pass the test
3c23b28 Consistently disable simplify_exprt::local_replace_map
da63652 Merge pull request diffblue#2054 from romainbrenguier/bugfix/clear-equations
d77f6a2 Merge pull request diffblue#1831 from NathanJPhillips/feature/class-annotations
60c8296 Clear string_refinement equations (not dependencies)
314ed53 Correcting the value of ID_null_object
751a882 Factor out default CBMC options to static method
6f24009 Can now test for an option being set in optionst
9a8d937 Add to_annotated_type and enable type_checked_cast for annotated_typet
ca77b4e Add test for added annotations
b06a27d Introduce abstract qualifierst base class
e6fb3bf Pretty printing of java_class_typet
e22b95b Fix spurious virtual function related keywords
3ac6d17 Add type_dynamic_cast and friends for java_class_typet
ce1f4d2 Add annotations to java_class_typet, its methods and fields
f84753d Merge pull request diffblue#2042 from hannes-steffenhagen-diffblue/add_deprecate_macro
7a38669 Merge pull request diffblue#2017 from NathanJPhillips/feature/overlay-classes
75a4aec Revert "the deprecation will need to wait until codebase is clean"
67735b5 Disable deprecation warnings by default
0764f77 Merge pull request diffblue#2036 from romainbrenguier/id_array_list
690b606 Merge pull request diffblue#2039 from peterschrammel/fix-duplicate-msg-json-ui
bba17d9 the deprecation will need to wait until codebase is clean
822c757 Regression test for redundant JSON message output
de0644d Only force end of previous message if there actually is one.
5a637bf Merge pull request diffblue#2037 from hannes-steffenhagen-diffblue/add_deprecate_macro
bff456a Merge pull request diffblue#2040 from tautschnig/remove-swp
87ebe90 Remove vim temp file
228c019 Fix duplicate message output in json-ui
0a2c43e Add DEPRECATED to functions documented as \deprecated
47f4b35 interval_sparse_arrayt constructor from array-list
026c4ca Declare an array_list_exprt class
50a2696 Define ID_array_list
513b67a Merge pull request diffblue#2038 from romainbrenguier/bugfix/assign-at-malloc-site
c207106 Test with array of strings
60183a3 Assign string at malloc site
116fffd Add DEPRECATED macro to mark deprecated functions and variables
7952f2c Add option to generate function body to goto-instrument
3d4183a Add ability to overlay classes with new definitions of existing methods
dbc2f71 Improve code and error message in infer_opaque_type_fields
7c0ea4d Tidied up java_class_loader_limitt

git-subtree-dir: cbmc
git-subtree-split: ad62682
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants