Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String Literals Domain #1048

Merged
merged 46 commits into from
Jun 1, 2023
Merged

Conversation

nathanschmidt
Copy link
Collaborator

This PR extends the base analysis with cases for strcpy (adapted), strncpy, strcat, strncat, strlen, strstr, strcmp and strncmp. The following scenarios can now be analyzed:

  • If the parameter dest of strcpy, strncpy, strcat or strncat potentially is a string literal, Goblint will now warn about a possible undefined behavior. For this purpose, the previous implementation of strcpy had to be adapted.
  • If all the potential values of the parameters of the functions strlen, strstr, strcmp and strncmp are string literals, Goblint can now compute the exact return value of these functions.

I also added two regression tests, both with assertions that can now be tracked and assertions for which the String Literals Domain can still not make any assertion.
Note: The regression tests fail on MacOS while they succeed on Linux. I couldn't yet figure out why that is the case.

nathanschmidt and others added 26 commits April 30, 2023 22:57
@sim642 sim642 self-requested a review May 16, 2023 06:33
@sim642 sim642 added the feature label May 16, 2023
@sim642
Copy link
Member

sim642 commented May 20, 2023

@sim642 You had some way to debug OS X things iirc? Do you mind sharing how that worked again?

The CI summary page has a suite_result artifact at the bottom, which contains all the logs from the failed OSX job.

Comment on lines 268 to 270
let to_string_length = function
| StrPtr (Some x) -> Some (String.length x)
| _ -> None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this might not be so easy because C strings and OCaml strings are different. In particular a string literal "hello\0world" would have strlen 5, but the OCaml string has length 11.
The same probably applies to other length-related functionality as well.

Maybe CIL has some helper functions for manipulating strings with C semantics?

Copy link
Collaborator Author

@nathanschmidt nathanschmidt May 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's absolutely right, again thank you for thinking about this case! I couldn't find a CIL function that would help me in this case in a quick search, but I'll have another look. Otherwise, String.split_on_char '\x00' x and taking the first element of the resulting list would do the trick.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is now fixed for strlen, but I think it still needs to be fixed for some of the other functions. I think it matters for strstr and strcmp as well. Right now they're just performing the operations on OCaml strings, but all of those operations need to be performed with C semantics, so only up to the NUL byte considered when checking for inclusion etc.

I'm wondering if a better solution is to maybe just do the stripping initially, so it doesn't have to be done for every operation individually. Or maybe CIL is already doing that as well?
I haven't checked, but it would be good to have tests for these things. If sv-benchmarks doesn't contain such cases, it would be nice to also contribute those because this detail is easy to get wrong and become unsound.

Copy link
Collaborator Author

@nathanschmidt nathanschmidt May 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first assumption was that CIL would cut the string at the first NULL byte, that's why I didn't do the stripping. After your comment, I tested this and found out that's not the case.

As you're right that this also concerns strstr and str(n)cmp, I introduced a new function to_c_string performing the stripping which I now use accross all my other functions. I didn't change to_string as some parts of the base analysis not related to my PR rely on it.

I added a new 02-string_literals_with_null.c test to verify this edge case is now correctly covered.

@michael-schwarz michael-schwarz requested a review from sim642 May 24, 2023 14:13
@michael-schwarz
Copy link
Member

@sim642 do you want to do a final pass or are you good with merging?

@michael-schwarz michael-schwarz requested review from sim642 and removed request for sim642 May 31, 2023 14:02
@sim642 sim642 merged commit 9b14dde into goblint:master Jun 1, 2023
@sim642 sim642 added this to the v2.2.0 milestone Jun 1, 2023
sim642 added a commit to sim642/opam-repository that referenced this pull request Sep 13, 2023
CHANGES:

* Add `setjmp`/`longjmp` analysis (goblint/analyzer#887, goblint/analyzer#970, goblint/analyzer#1015, goblint/analyzer#1019).
* Refactor race analysis to lazy distribution (goblint/analyzer#1084, goblint/analyzer#1089, goblint/analyzer#1136, goblint/analyzer#1016).
* Add thread-unsafe library function call analysis (goblint/analyzer#723, goblint/analyzer#1082).
* Add mutex type analysis and mutex API analysis (goblint/analyzer#800, goblint/analyzer#839, goblint/analyzer#1073).
* Add interval set domain and string literals domain (goblint/analyzer#901, goblint/analyzer#966, goblint/analyzer#994, goblint/analyzer#1048).
* Add affine equalities analysis (goblint/analyzer#592).
* Add use-after-free analysis (goblint/analyzer#1050, goblint/analyzer#1114).
* Add dead code elimination transformation (goblint/analyzer#850, goblint/analyzer#979).
* Add taint analysis for partial contexts (goblint/analyzer#553, goblint/analyzer#952).
* Add YAML witness validation via unassume (goblint/analyzer#796, goblint/analyzer#977, goblint/analyzer#1044, goblint/analyzer#1045, goblint/analyzer#1124).
* Add incremental analysis rename detection (goblint/analyzer#774, goblint/analyzer#777).
* Fix address sets unsoundness (goblint/analyzer#822, goblint/analyzer#967, goblint/analyzer#564, goblint/analyzer#1032, goblint/analyzer#998, goblint/analyzer#1031).
* Fix thread escape analysis unsoundness (goblint/analyzer#939, goblint/analyzer#984, goblint/analyzer#1074, goblint/analyzer#1078).
* Fix many incremental analysis issues (goblint/analyzer#627, goblint/analyzer#836, goblint/analyzer#835, goblint/analyzer#841, goblint/analyzer#932, goblint/analyzer#678, goblint/analyzer#942, goblint/analyzer#949, goblint/analyzer#950, goblint/analyzer#957, goblint/analyzer#955, goblint/analyzer#954, goblint/analyzer#960, goblint/analyzer#959, goblint/analyzer#1004, goblint/analyzer#558, goblint/analyzer#1010, goblint/analyzer#1091).
* Fix server mode for abstract debugging (goblint/analyzer#983, goblint/analyzer#990, goblint/analyzer#997, goblint/analyzer#1000, goblint/analyzer#1001, goblint/analyzer#1013, goblint/analyzer#1018, goblint/analyzer#1017, goblint/analyzer#1026, goblint/analyzer#1027).
* Add documentation for configuration JSON schema and OCaml API (goblint/analyzer#999, goblint/analyzer#1054, goblint/analyzer#1055, goblint/analyzer#1053).
* Add many library function specifications (goblint/analyzer#962, goblint/analyzer#996, goblint/analyzer#1028, goblint/analyzer#1079, goblint/analyzer#1121, goblint/analyzer#1135, goblint/analyzer#1138).
* Add OCaml 5.0 support (goblint/analyzer#1003, goblint/analyzer#945, goblint/analyzer#1162).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants