Abstract interpretation framework: allow selective domain retention #2572

smowton · 2018-07-10T17:13:49Z

This allows users of the ait abstract interpretation framework to specify that only certain states are needed after
its fixpoint computation is finished. For example, perhaps a particular analysis, or a particular user of that analysis,
only needs to know the results at END_FUNCTION instructions. In this case when a state does not need to be retained, and
only has a single successor, that state will be altered in place and passed to its successor instruction using std::move,
permitting an analysis to optimise away much (potentially all) state copying.

In the best case (a long, straight-line function whose state grows with every instruction, such as a constant propagator
reading "int x = 1; int y = 2; int z = 3; ...") this reduces analysis space and time complexity from quadratic to linear.

smowton · 2018-07-10T17:14:42Z

@tautschnig @peterschrammel @thk123 this definitely needs a unit test, since it is not yet exercised in cbmc, but the implementation is complete and ready for review.

tautschnig · 2018-07-10T17:21:56Z

Would it be fair to describe this work as computing summaries?
Unit tests are nice, but integrating into constant propagation would be the real deal.

allredj

This PR failed Diffblue compatibility checks (cbmc commit: 28c372d).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/78599041
Status will be re-evaluated on next push.
Please contact @peterschrammel, @thk123, or @allredj for support.

Common spurious failures:

the cbmc commit has disappeared in the mean time (e.g. in a force-push)
the author is not in the list of contributors (e.g. first-time contributors).

smowton · 2018-07-10T22:15:52Z

Will submit tests with this one then followup for constant prop (that patch will literally just pass through the predicate argument, since different constant prop users have different needs)

I wouldn't say this does summarisation, rather it lets you tell ait that you don't intend to make full use of the random access (by program point) interface it provides, therefore it can optimise to avoid storing those states you're not interested in if doing so isn't important for whatever fix point algorithm it uses.

A user who was generating summaries would presumably analyse a function at a time and pass a predicate saying they only care about the end instruction. Other cases I've had need for so far:

Only care about basic block heads
Only care about call instructions

martin-cs · 2018-07-10T23:29:22Z

Interesting and worthwhile although I think it will break the patch set that I've been working on for a while now to be able to add history sensitivity. It's a little late for me to try the integration now but would it be possible for us to work on getting both working together?

smowton · 2018-07-11T06:31:57Z

Got a link to a branch?

tautschnig · 2018-07-11T07:26:12Z

I've got some alarm bells going off here for me: given that this field has seen decades of research, is this an implementation of something that has been published before, or is it possibly leading to a publication? Or should the solution much rather be in using sharing maps that @danpoe is working on?

smowton · 2018-07-11T08:29:30Z

I'd be very surprised if you could publish what to my mind is a fairly obvious optimisation. Dan's sharing maps are a great tool for when you do want to be able to access the result at every program point, and the program point results are closely related such that you can say "result at loc n+1 = result at loc n WITH x <- y" some or most of the time. On the other hand if you just don't need a result at most program points you can skip much of the complexity and do it like this.

Re: published before, I think you could argue goto-symex is using a pretty similar idea (working in-place for the most part and copying it only when necessary) apart from the fact that it doesn't cater for back-edges due to its unrolling approach.

martin-cs · 2018-07-11T08:37:42Z

@smowton : working on it but running SC^2 today and presenting at SMT tomorrow so it might not be today.

@tautschnig : as I see it, sharing maps are orthogonal to this optimisation. They help reduce the amount of memory used to store domains; this reduces the number of domains stored. As for publication ... if you know of a good source of published material on the actual practicalities of implementing an abstract interpreter I'd be interested to read it. The thing I've been working on I am hoping to publish.

peterschrammel · 2018-07-11T11:27:49Z

Typically, an abstract interpreter stores the abstract state only at CFG cut points (eg loop heads and call sites of recursive functions), minimally required to decide when you have reached the fixed point (we don't have that optimisation right now - maybe @martin-cs has it in his implementation). Additionally, you might want to remember the abstract state in certain locations of interest (I think that's the intention of @smowton).

smowton · 2018-07-11T12:57:11Z

I somewhat approximate the cut points as "anything with multiple predecessors" because we lack a loop analysis, but otherwise yes that's what I'm doing.

allredj

This PR failed Diffblue compatibility checks (cbmc commit: 6b7b504).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/78711776
Status will be re-evaluated on next push.
Please contact @peterschrammel, @thk123, or @allredj for support.

Common spurious failures:

the cbmc commit has disappeared in the mean time (e.g. in a force-push)
the author is not in the list of contributors (e.g. first-time contributors).

tautschnig · 2018-07-11T15:57:48Z

My hesitation remains: the code and interface are becoming more complicated, and that's in a way that is non-standard. If adopting the terminology of "cut points" some of this is remedied. With regard to @martin-cs' comment: yes, I acknowledge that there is a difference between number-of-domains-stored vs amount-of-storage-required-for-domains, but the net effect to the end user isn't actually different?

allredj

This PR failed Diffblue compatibility checks (cbmc commit: d7b4b91).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/78716828
Status will be re-evaluated on next push.
Please contact @peterschrammel, @thk123, or @allredj for support.

Common spurious failures:

the cbmc commit has disappeared in the mean time (e.g. in a force-push)
the author is not in the list of contributors (e.g. first-time contributors).

This allows users of the `ait` abstract interpretation framework to specify that only certain states are needed after its fixpoint computation is finished. For example, perhaps a particular analysis, or a particular user of that analysis, only needs to know the results at END_FUNCTION instructions. In this case when a state does not need to be retained, and only has a single successor, that state will be altered in place and passed to its successor instruction using std::move, permitting an analysis to optimise away much (potentially all) state copying. In the best case (a long, straight-line function whose state grows with every instruction, such as a constant propagator reading "int x = 1; int y = 2; int z = 3; ...") this reduces analysis space and time complexity from quadratic to linear.

smowton · 2018-07-11T17:07:01Z

@peterschrammel @tautschnig @martin-cs updated this to (a) deal with the interprocedural edges correctly, and (b) include a reasonably thorough unit test. As structured it should be unable to break any existing users, since per default it retains all states.

@tautschnig I'm overapproximating what constitutes a cut-point, but I can add comments to that effect if you like? Re: sharing-map vs. this solution, this is just easier to implement, both for me and for the author / user of an ait-based analysis -- the space saving comes for free (just pass a predicate to ai_baset's constructor), and the time saving just requires implementing a moving version of merge, compared to using sharing-map which requires me to make sure my state data structures conform to sharing-map's constraints.

@peterschrammel I'm actually retaining state even at non-loop convergences (e.g. an if/else convergence) because I can't generally depend on instructions being visited in toplogical order due to backward GOTOs that aren't loop back-edges -- is there a formal way to describe that policy?

I was thinking of invariants that need to be true for this to work: I think I just need that
BOTTOM-MERGE-A == B-MERGE-A if B <= A, which I believe is a reasonable ask?

allredj

This PR failed Diffblue compatibility checks (cbmc commit: 82f11c9).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/78722174
Status will be re-evaluated on next push.
Please contact @peterschrammel, @thk123, or @allredj for support.

Common spurious failures:

the cbmc commit has disappeared in the mean time (e.g. in a force-push)
the author is not in the list of contributors (e.g. first-time contributors).

allredj

This PR failed Diffblue compatibility checks (cbmc commit: a87ef74).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/78722466
Status will be re-evaluated on next push.
Please contact @peterschrammel, @thk123, or @allredj for support.

Common spurious failures:

the cbmc commit has disappeared in the mean time (e.g. in a force-push)
the author is not in the list of contributors (e.g. first-time contributors).

smowton · 2018-07-12T14:58:25Z

(If @tautschnig and I have reached an impasse, @peterschrammel @martin-cs could you provide tie-breaking opinions?)

peterschrammel · 2018-07-12T15:08:41Z

To break dependencies and avoid cluttering of existing interfaces, would it be possible to introduce your optimisation as an extra layer (ie extend rather than modify) so that existing analyses may derive from the interface that offers the optimisation or the original one?

martin-cs · 2018-07-12T15:17:22Z

On Thu, 2018-07-12 at 14:58 +0000, Chris Smowton wrote: (If @tautschnig and I have reached an impasse, @peterschrammel @martin-cs could you provide tie-breaking opinions?)

@peterschrammel and I discussed this at lunch time. From my point of view we concluded the following: *. We need to be careful with the interface; we don't want to wind up with ait in the same mess as bmct. *. The functionality is useful and worth having but must be optional. *. It is not clear what the best way of integrating this is. I think I volunteered (now I am not running a workshop) to try to integrate this in my patch set.

smowton · 2018-07-13T10:35:59Z

@peterschrammel @martin-cs This functionality is already optional and default-off: the must-retain-state predicate is supplied as a constructor parameter that defaults to null (== retain all states, old behaviour). Implementing a move-merge operator on your domain is also optional (and orthogonal -- the combinations (move-merge, retain-all-states) and (copy-merge, retain-some-states) are both faster than the default (copy-merge, retain-all-states).

smowton · 2018-07-13T10:40:47Z

@peterschrammel I'm not convinced inheriting from ai_with_selective_domainst vs. ait would be any cleaner than supplying or not-supplying a constructor argument to ait? With inheritance we would either need to duplicate or provide a fairly ugly hook into the visit routine, and the inheriting analysis would need to provide some run-time customisation possibility anyhow if it wanted its users to be able to choose between storing all states and storing them selectively.

martin-cs · 2018-07-16T01:15:10Z

@smowton : I realise this is time critical / blocking for you but I have run out of time this weekend to work on this. I'm going to start skipping sessions tomorrow until I have some code for you to look at.

martin-cs · 2018-07-16T11:27:42Z

I had a chat with @smowton during the coffee break. I think we concluded:

We will get together next week and see if we can merge effort and get things into trunk.
In the mean time, @smowton will merge into the security scanner branch so it doesn't block them.

martin-cs · 2018-08-06T12:05:42Z

Thanks @peterschrammel this is on my immediate TODO anyway...

smowton · 2019-03-06T16:10:08Z

@martin-cs had any chance to work on history-sensitivity lately?

martin-cs · 2019-03-21T19:15:10Z

On Wed, 2019-03-06 at 16:10 +0000, Chris Smowton wrote: @martin-cs had any chance to work on history-sensitivity lately?

Actually yes! More time than I have had to catch up on my CPROVER correspondence any way! Getting the history framework done and merged is now my primary development task. @tautschnig has been very helpful in shaping my thoughts and I think I now have a way of getting all of the benefits of dropping domains but also allowing on-the-fly recomputation so the that users of ait are none the wiser. I'm going to refactor the current history sensitive framework over the next little while and was then thinking I might ping you, Peter, Daniel P, Chris R and Thomas for opinions.

TGWDB · 2021-07-09T12:54:14Z

Closing this PR as it appears that no conclusion on how to progress/merge this work was both reached and followed through upon. If you believe this is erroneous please reopen and update.

martin-cs · 2021-07-09T13:44:47Z

Just for conclusion. The history patches did get merged and there is #2572 (comment) interface in the storage. To implement what @chrisr-diffblue originally asked for, one would simply need to:

Implement prune to throw away the domain.
Have abstract_state_before return a new TOP domain if anyone ever asks for a pruned location.
Inherit from your chosen implementation of ai_baset and override ai_baset::visit_function_call so that it prunes locations when the function call is done.

If you prune every location apart from the last then you will get the biggest saving without loosing accuracy. If you prune every location with one inbound edge (apart from first and last) then you should avoid recomputing anything if it is called again. If you are feeling really really fancy then you can hook ai_baset::abstract_state_before and do on-the-fly (re-)computation of domains.

smowton requested review from chrisr-diffblue, martin-cs, peterschrammel and thk123 as code owners July 10, 2018 17:13

smowton requested a review from tautschnig July 10, 2018 17:14

allredj reviewed Jul 10, 2018

View reviewed changes

smowton force-pushed the smowton/feature/ai-sparse-domains branch from 28c372d to 6b7b504 Compare July 11, 2018 14:55

smowton requested review from kroening and pkesseli as code owners July 11, 2018 14:55

allredj reviewed Jul 11, 2018

View reviewed changes

smowton force-pushed the smowton/feature/ai-sparse-domains branch from 6b7b504 to d7b4b91 Compare July 11, 2018 15:54

smowton requested a review from a team as a code owner July 11, 2018 15:54

allredj reviewed Jul 11, 2018

View reviewed changes

smowton force-pushed the smowton/feature/ai-sparse-domains branch from d7b4b91 to 82f11c9 Compare July 11, 2018 16:45

smowton force-pushed the smowton/feature/ai-sparse-domains branch from 82f11c9 to a87ef74 Compare July 11, 2018 16:48

allredj reviewed Jul 11, 2018

View reviewed changes

smowton mentioned this pull request Jul 25, 2018

Add test for ait framework #2616

Merged

peterschrammel assigned smowton and martin-cs Aug 4, 2018

TGWDB closed this Jul 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Abstract interpretation framework: allow selective domain retention #2572

Abstract interpretation framework: allow selective domain retention #2572

smowton commented Jul 10, 2018

smowton commented Jul 10, 2018

tautschnig commented Jul 10, 2018

allredj left a comment

smowton commented Jul 10, 2018

martin-cs commented Jul 10, 2018 via email

smowton commented Jul 11, 2018

tautschnig commented Jul 11, 2018

smowton commented Jul 11, 2018

martin-cs commented Jul 11, 2018

peterschrammel commented Jul 11, 2018 •

edited

Loading

smowton commented Jul 11, 2018

allredj left a comment

tautschnig commented Jul 11, 2018

allredj left a comment

smowton commented Jul 11, 2018

allredj left a comment

allredj left a comment

smowton commented Jul 12, 2018

peterschrammel commented Jul 12, 2018

martin-cs commented Jul 12, 2018 via email

smowton commented Jul 13, 2018 •

edited

Loading

smowton commented Jul 13, 2018

martin-cs commented Jul 16, 2018

martin-cs commented Jul 16, 2018

martin-cs commented Aug 6, 2018

smowton commented Mar 6, 2019

martin-cs commented Mar 21, 2019 via email

TGWDB commented Jul 9, 2021

martin-cs commented Jul 9, 2021

Abstract interpretation framework: allow selective domain retention #2572

Abstract interpretation framework: allow selective domain retention #2572

Conversation

smowton commented Jul 10, 2018

smowton commented Jul 10, 2018

tautschnig commented Jul 10, 2018

allredj left a comment

Choose a reason for hiding this comment

smowton commented Jul 10, 2018

martin-cs commented Jul 10, 2018 via email

smowton commented Jul 11, 2018

tautschnig commented Jul 11, 2018

smowton commented Jul 11, 2018

martin-cs commented Jul 11, 2018

peterschrammel commented Jul 11, 2018 • edited Loading

smowton commented Jul 11, 2018

allredj left a comment

Choose a reason for hiding this comment

tautschnig commented Jul 11, 2018

allredj left a comment

Choose a reason for hiding this comment

smowton commented Jul 11, 2018

allredj left a comment

Choose a reason for hiding this comment

allredj left a comment

Choose a reason for hiding this comment

smowton commented Jul 12, 2018

peterschrammel commented Jul 12, 2018

martin-cs commented Jul 12, 2018 via email

smowton commented Jul 13, 2018 • edited Loading

smowton commented Jul 13, 2018

martin-cs commented Jul 16, 2018

martin-cs commented Jul 16, 2018

martin-cs commented Aug 6, 2018

smowton commented Mar 6, 2019

martin-cs commented Mar 21, 2019 via email

TGWDB commented Jul 9, 2021

martin-cs commented Jul 9, 2021

peterschrammel commented Jul 11, 2018 •

edited

Loading

smowton commented Jul 13, 2018 •

edited

Loading