-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Taint analysis to improve the precision loss of partial contexts #952
Conversation
Thank you for the PR! I'll most likely only get around to review it in the first week of January. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After this small typo is fixed, it looks good to me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conceptually the taintPartialContexts analysis is quite similar to our access analysis: recording accessed lvals. It probably would be possible to merge the two, but that's not necessary now.
src/analyses/base.ml
Outdated
| Some (`Array _) when (get_string "ana.base.arrays.domain") = "partitioned" -> begin | ||
(* partitioned arrays cannot be copied by individual lvalues, so if tainted just copy the whole callee value for the array variable *) | ||
let new_arry_opt = CPA.find_opt v fun_st.cpa in | ||
match new_arry_opt with | ||
| None -> st | ||
| Some new_arry -> {st with cpa = CPA.add v new_arry st.cpa} | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just wondering, is there anything that could be done to avoid having to special case this outside of the array domain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be possible to taint the whole array when a lval inside a partitioned array is written.
However special casing this inside the taintPartialContext
analysis does not seem like a better approach, as there might be other analyses that can benefit from the finer taint information for partitioned arrays.
Other than that I was not able to come up with a better idea to address this issue.
src/analyses/base.ml
Outdated
if M.tracing then M.trace "taintPC" "updating %a; type: %a\n" Lval.CilLval.pretty (v, o) d_type lval_type; | ||
match CPA.find_opt v (fun_st.cpa) with | ||
| None -> st | ||
| Some (`Array _) when (get_string "ana.base.arrays.domain") = "partitioned" -> begin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the latest version, some arrays may be partitioned, while others are not: One would need to check for this specific array here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have fixed this by adding domain_of_t
to the signature of ArrayDomain.S
and using this to check if the Array is partitioned:
| Some ('Array a) when (CArrays.domain_of_t a) = PartitionedDomain ->
A different approach I thought of was to use ArrayDomain.get_domain
:
(ArrayDomain.get_domain ~varAttr:v.vattr ~typAttr:(typeAttrs lval_type)) = PartitionedDomain
I chose the first idea as it seemed more straightforward to "ask the array what it is" instead of inspecting the attributes, but I will of course change the approach if you prefer not adding a function to the ArrayDomain.S
signature.
1dcc477
to
86ac4fd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has there been an SV-COMP run with this analysis enabled? Given that a regression test from sv-benchmarks has been adapted, I would guess so?
Yes, here they are: results-felix-all.zip. The first run is: base ctx_insens, no taint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test group should be changed from 64-taint to 65-taint, because there's 64-noreturn now. Other than that this should be good.
I rebased onto the master and renamed the test directory. |
included in base and apron analysis
+ change analysis to use identity spec
- in base, the tainted set is filtered, so that completely new values from callee are not in the tainted set and copied again
Somehow 65-taint/04-multithread marshaling fails on MacOS: https://github.com/goblint/analyzer/actions/runs/4203078719. |
CHANGES: * Add `setjmp`/`longjmp` analysis (goblint/analyzer#887, goblint/analyzer#970, goblint/analyzer#1015, goblint/analyzer#1019). * Refactor race analysis to lazy distribution (goblint/analyzer#1084, goblint/analyzer#1089, goblint/analyzer#1136, goblint/analyzer#1016). * Add thread-unsafe library function call analysis (goblint/analyzer#723, goblint/analyzer#1082). * Add mutex type analysis and mutex API analysis (goblint/analyzer#800, goblint/analyzer#839, goblint/analyzer#1073). * Add interval set domain and string literals domain (goblint/analyzer#901, goblint/analyzer#966, goblint/analyzer#994, goblint/analyzer#1048). * Add affine equalities analysis (goblint/analyzer#592). * Add use-after-free analysis (goblint/analyzer#1050, goblint/analyzer#1114). * Add dead code elimination transformation (goblint/analyzer#850, goblint/analyzer#979). * Add taint analysis for partial contexts (goblint/analyzer#553, goblint/analyzer#952). * Add YAML witness validation via unassume (goblint/analyzer#796, goblint/analyzer#977, goblint/analyzer#1044, goblint/analyzer#1045, goblint/analyzer#1124). * Add incremental analysis rename detection (goblint/analyzer#774, goblint/analyzer#777). * Fix address sets unsoundness (goblint/analyzer#822, goblint/analyzer#967, goblint/analyzer#564, goblint/analyzer#1032, goblint/analyzer#998, goblint/analyzer#1031). * Fix thread escape analysis unsoundness (goblint/analyzer#939, goblint/analyzer#984, goblint/analyzer#1074, goblint/analyzer#1078). * Fix many incremental analysis issues (goblint/analyzer#627, goblint/analyzer#836, goblint/analyzer#835, goblint/analyzer#841, goblint/analyzer#932, goblint/analyzer#678, goblint/analyzer#942, goblint/analyzer#949, goblint/analyzer#950, goblint/analyzer#957, goblint/analyzer#955, goblint/analyzer#954, goblint/analyzer#960, goblint/analyzer#959, goblint/analyzer#1004, goblint/analyzer#558, goblint/analyzer#1010, goblint/analyzer#1091). * Fix server mode for abstract debugging (goblint/analyzer#983, goblint/analyzer#990, goblint/analyzer#997, goblint/analyzer#1000, goblint/analyzer#1001, goblint/analyzer#1013, goblint/analyzer#1018, goblint/analyzer#1017, goblint/analyzer#1026, goblint/analyzer#1027). * Add documentation for configuration JSON schema and OCaml API (goblint/analyzer#999, goblint/analyzer#1054, goblint/analyzer#1055, goblint/analyzer#1053). * Add many library function specifications (goblint/analyzer#962, goblint/analyzer#996, goblint/analyzer#1028, goblint/analyzer#1079, goblint/analyzer#1121, goblint/analyzer#1135, goblint/analyzer#1138). * Add OCaml 5.0 support (goblint/analyzer#1003, goblint/analyzer#945, goblint/analyzer#1162).
Adding a new analysis
taintPartialContexts
to track modified lvalues per function. The gained information is used inbase
,relationAnalysis.apron
,condVars
andvarEq
to keep unwritten values. (Issue #553)The analysis
taintPartialContexts
tracks a set of modified values for each function, starting with the empty set when entering a function. This set can be accessed via a new queryMayBeTainted
, which is used in the combine of the four mentioned analyses to improve precision when partial contexts are used.base
: thecpa
from the caller is updated:relationAnalysis.apron
: from the caller only the globals and reachable locals are removed, which are in the tainted set. The resulting state is unified with the callee state (only making the caller state more precise)condVars
: only untainted information is kept. Callee state is not merged.varEq
: only untainted information is kept from caller. Callee state is mergedA necessary but noteworthy change was the addition of the
(f_ask : Queries.ask)
to the signature of the combine function, which allows querying the return state of the called function.Regression tests for each analysis have been added as well as some for specific edge cases.