Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Taint analysis to improve the precision loss of partial contexts #952

Merged
merged 13 commits into from
Feb 16, 2023

Conversation

FelixKrayer
Copy link
Contributor

Adding a new analysis taintPartialContexts to track modified lvalues per function. The gained information is used in base, relationAnalysis.apron, condVars and varEq to keep unwritten values. (Issue #553)

The analysis taintPartialContexts tracks a set of modified values for each function, starting with the empty set when entering a function. This set can be accessed via a new query MayBeTainted, which is used in the combine of the four mentioned analyses to improve precision when partial contexts are used.

  • base: the cpa from the caller is updated:
    • everything is removed, which does not exist in the callee anymore
    • everything is added from callee, which was not in caller before (new info from callee is always better)
    • folding over the tainted set, caller is updated lval-wise for every lval that is tainted
  • relationAnalysis.apron: from the caller only the globals and reachable locals are removed, which are in the tainted set. The resulting state is unified with the callee state (only making the caller state more precise)
  • condVars: only untainted information is kept. Callee state is not merged.
  • varEq: only untainted information is kept from caller. Callee state is merged

A necessary but noteworthy change was the addition of the (f_ask : Queries.ask) to the signature of the combine function, which allows querying the return state of the called function.

Regression tests for each analysis have been added as well as some for specific edge cases.

@michael-schwarz
Copy link
Member

Thank you for the PR! I'll most likely only get around to review it in the first week of January.

Copy link
Member

@michael-schwarz michael-schwarz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After this small typo is fixed, it looks good to me!

Copy link
Member

@sim642 sim642 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conceptually the taintPartialContexts analysis is quite similar to our access analysis: recording accessed lvals. It probably would be possible to merge the two, but that's not necessary now.

Comment on lines 2253 to 2260
| Some (`Array _) when (get_string "ana.base.arrays.domain") = "partitioned" -> begin
(* partitioned arrays cannot be copied by individual lvalues, so if tainted just copy the whole callee value for the array variable *)
let new_arry_opt = CPA.find_opt v fun_st.cpa in
match new_arry_opt with
| None -> st
| Some new_arry -> {st with cpa = CPA.add v new_arry st.cpa}
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just wondering, is there anything that could be done to avoid having to special case this outside of the array domain?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be possible to taint the whole array when a lval inside a partitioned array is written.
However special casing this inside the taintPartialContext analysis does not seem like a better approach, as there might be other analyses that can benefit from the finer taint information for partitioned arrays.

Other than that I was not able to come up with a better idea to address this issue.

if M.tracing then M.trace "taintPC" "updating %a; type: %a\n" Lval.CilLval.pretty (v, o) d_type lval_type;
match CPA.find_opt v (fun_st.cpa) with
| None -> st
| Some (`Array _) when (get_string "ana.base.arrays.domain") = "partitioned" -> begin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the latest version, some arrays may be partitioned, while others are not: One would need to check for this specific array here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have fixed this by adding domain_of_t to the signature of ArrayDomain.S and using this to check if the Array is partitioned:
| Some ('Array a) when (CArrays.domain_of_t a) = PartitionedDomain ->

A different approach I thought of was to use ArrayDomain.get_domain:
(ArrayDomain.get_domain ~varAttr:v.vattr ~typAttr:(typeAttrs lval_type)) = PartitionedDomain

I chose the first idea as it seemed more straightforward to "ask the array what it is" instead of inspecting the attributes, but I will of course change the approach if you prefer not adding a function to the ArrayDomain.S signature.

Copy link
Member

@sim642 sim642 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has there been an SV-COMP run with this analysis enabled? Given that a regression test from sv-benchmarks has been adapted, I would guess so?

@michael-schwarz
Copy link
Member

Yes, here they are: results-felix-all.zip.

The first run is: base ctx_insens, no taint
The second one is: base ctc_insens, taint
The third one is: base ctc_sens, no taint

@sim642 sim642 self-requested a review February 9, 2023 10:26
@michael-schwarz
Copy link
Member

@sim642 Would you mind doing your review of this at your earliest convenience? I think I will be able to build on this for the detection if non-volatile locals are modified since a longjmp/setjmp for #970.

Copy link
Member

@sim642 sim642 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test group should be changed from 64-taint to 65-taint, because there's 64-noreturn now. Other than that this should be good.

@FelixKrayer
Copy link
Contributor Author

I rebased onto the master and renamed the test directory.
Additionally I changed a small thing in the base combine, so that new mappings that are also tainted are not added to the callee state and then updated again in combine_st. Changes nothing in the result but removes unnecessary computations

@michael-schwarz michael-schwarz merged commit 3fc209f into goblint:master Feb 16, 2023
michael-schwarz added a commit that referenced this pull request Feb 16, 2023
sim642 added a commit that referenced this pull request Feb 16, 2023
@sim642
Copy link
Member

sim642 commented Feb 17, 2023

Somehow 65-taint/04-multithread marshaling fails on MacOS: https://github.com/goblint/analyzer/actions/runs/4203078719.

@FelixKrayer FelixKrayer deleted the taint branch March 1, 2023 14:31
@sim642 sim642 mentioned this pull request Mar 20, 2023
10 tasks
@sim642 sim642 added this to the v2.2.0 milestone Apr 5, 2023
sim642 added a commit to sim642/opam-repository that referenced this pull request Sep 13, 2023
CHANGES:

* Add `setjmp`/`longjmp` analysis (goblint/analyzer#887, goblint/analyzer#970, goblint/analyzer#1015, goblint/analyzer#1019).
* Refactor race analysis to lazy distribution (goblint/analyzer#1084, goblint/analyzer#1089, goblint/analyzer#1136, goblint/analyzer#1016).
* Add thread-unsafe library function call analysis (goblint/analyzer#723, goblint/analyzer#1082).
* Add mutex type analysis and mutex API analysis (goblint/analyzer#800, goblint/analyzer#839, goblint/analyzer#1073).
* Add interval set domain and string literals domain (goblint/analyzer#901, goblint/analyzer#966, goblint/analyzer#994, goblint/analyzer#1048).
* Add affine equalities analysis (goblint/analyzer#592).
* Add use-after-free analysis (goblint/analyzer#1050, goblint/analyzer#1114).
* Add dead code elimination transformation (goblint/analyzer#850, goblint/analyzer#979).
* Add taint analysis for partial contexts (goblint/analyzer#553, goblint/analyzer#952).
* Add YAML witness validation via unassume (goblint/analyzer#796, goblint/analyzer#977, goblint/analyzer#1044, goblint/analyzer#1045, goblint/analyzer#1124).
* Add incremental analysis rename detection (goblint/analyzer#774, goblint/analyzer#777).
* Fix address sets unsoundness (goblint/analyzer#822, goblint/analyzer#967, goblint/analyzer#564, goblint/analyzer#1032, goblint/analyzer#998, goblint/analyzer#1031).
* Fix thread escape analysis unsoundness (goblint/analyzer#939, goblint/analyzer#984, goblint/analyzer#1074, goblint/analyzer#1078).
* Fix many incremental analysis issues (goblint/analyzer#627, goblint/analyzer#836, goblint/analyzer#835, goblint/analyzer#841, goblint/analyzer#932, goblint/analyzer#678, goblint/analyzer#942, goblint/analyzer#949, goblint/analyzer#950, goblint/analyzer#957, goblint/analyzer#955, goblint/analyzer#954, goblint/analyzer#960, goblint/analyzer#959, goblint/analyzer#1004, goblint/analyzer#558, goblint/analyzer#1010, goblint/analyzer#1091).
* Fix server mode for abstract debugging (goblint/analyzer#983, goblint/analyzer#990, goblint/analyzer#997, goblint/analyzer#1000, goblint/analyzer#1001, goblint/analyzer#1013, goblint/analyzer#1018, goblint/analyzer#1017, goblint/analyzer#1026, goblint/analyzer#1027).
* Add documentation for configuration JSON schema and OCaml API (goblint/analyzer#999, goblint/analyzer#1054, goblint/analyzer#1055, goblint/analyzer#1053).
* Add many library function specifications (goblint/analyzer#962, goblint/analyzer#996, goblint/analyzer#1028, goblint/analyzer#1079, goblint/analyzer#1121, goblint/analyzer#1135, goblint/analyzer#1138).
* Add OCaml 5.0 support (goblint/analyzer#1003, goblint/analyzer#945, goblint/analyzer#1162).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Track written lvalues per function and use this information in Base.combine
3 participants