Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abstract interpretation framework: allow selective domain retention #2572

Closed

Conversation

smowton
Copy link
Contributor

@smowton smowton commented Jul 10, 2018

This allows users of the ait abstract interpretation framework to specify that only certain states are needed after
its fixpoint computation is finished. For example, perhaps a particular analysis, or a particular user of that analysis,
only needs to know the results at END_FUNCTION instructions. In this case when a state does not need to be retained, and
only has a single successor, that state will be altered in place and passed to its successor instruction using std::move,
permitting an analysis to optimise away much (potentially all) state copying.

In the best case (a long, straight-line function whose state grows with every instruction, such as a constant propagator
reading "int x = 1; int y = 2; int z = 3; ...") this reduces analysis space and time complexity from quadratic to linear.

@smowton
Copy link
Contributor Author

smowton commented Jul 10, 2018

@tautschnig @peterschrammel @thk123 this definitely needs a unit test, since it is not yet exercised in cbmc, but the implementation is complete and ready for review.

@tautschnig
Copy link
Collaborator

  1. Would it be fair to describe this work as computing summaries?
  2. Unit tests are nice, but integrating into constant propagation would be the real deal.

Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR failed Diffblue compatibility checks (cbmc commit: 28c372d).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/78599041
Status will be re-evaluated on next push.
Please contact @peterschrammel, @thk123, or @allredj for support.

Common spurious failures:

  • the cbmc commit has disappeared in the mean time (e.g. in a force-push)
  • the author is not in the list of contributors (e.g. first-time contributors).

@smowton
Copy link
Contributor Author

smowton commented Jul 10, 2018

Will submit tests with this one then followup for constant prop (that patch will literally just pass through the predicate argument, since different constant prop users have different needs)

I wouldn't say this does summarisation, rather it lets you tell ait that you don't intend to make full use of the random access (by program point) interface it provides, therefore it can optimise to avoid storing those states you're not interested in if doing so isn't important for whatever fix point algorithm it uses.

A user who was generating summaries would presumably analyse a function at a time and pass a predicate saying they only care about the end instruction. Other cases I've had need for so far:

  • Only care about basic block heads
  • Only care about call instructions

@martin-cs
Copy link
Collaborator

martin-cs commented Jul 10, 2018 via email

@smowton
Copy link
Contributor Author

smowton commented Jul 11, 2018

Got a link to a branch?

@tautschnig
Copy link
Collaborator

I've got some alarm bells going off here for me: given that this field has seen decades of research, is this an implementation of something that has been published before, or is it possibly leading to a publication? Or should the solution much rather be in using sharing maps that @danpoe is working on?

@smowton
Copy link
Contributor Author

smowton commented Jul 11, 2018

I'd be very surprised if you could publish what to my mind is a fairly obvious optimisation. Dan's sharing maps are a great tool for when you do want to be able to access the result at every program point, and the program point results are closely related such that you can say "result at loc n+1 = result at loc n WITH x <- y" some or most of the time. On the other hand if you just don't need a result at most program points you can skip much of the complexity and do it like this.

Re: published before, I think you could argue goto-symex is using a pretty similar idea (working in-place for the most part and copying it only when necessary) apart from the fact that it doesn't cater for back-edges due to its unrolling approach.

@martin-cs
Copy link
Collaborator

@smowton : working on it but running SC^2 today and presenting at SMT tomorrow so it might not be today.

@tautschnig : as I see it, sharing maps are orthogonal to this optimisation. They help reduce the amount of memory used to store domains; this reduces the number of domains stored. As for publication ... if you know of a good source of published material on the actual practicalities of implementing an abstract interpreter I'd be interested to read it. The thing I've been working on I am hoping to publish.

@peterschrammel
Copy link
Member

peterschrammel commented Jul 11, 2018

Typically, an abstract interpreter stores the abstract state only at CFG cut points (eg loop heads and call sites of recursive functions), minimally required to decide when you have reached the fixed point (we don't have that optimisation right now - maybe @martin-cs has it in his implementation). Additionally, you might want to remember the abstract state in certain locations of interest (I think that's the intention of @smowton).

@smowton
Copy link
Contributor Author

smowton commented Jul 11, 2018

I somewhat approximate the cut points as "anything with multiple predecessors" because we lack a loop analysis, but otherwise yes that's what I'm doing.

@smowton smowton force-pushed the smowton/feature/ai-sparse-domains branch from 28c372d to 6b7b504 Compare July 11, 2018 14:55
Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR failed Diffblue compatibility checks (cbmc commit: 6b7b504).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/78711776
Status will be re-evaluated on next push.
Please contact @peterschrammel, @thk123, or @allredj for support.

Common spurious failures:

  • the cbmc commit has disappeared in the mean time (e.g. in a force-push)
  • the author is not in the list of contributors (e.g. first-time contributors).

@smowton smowton force-pushed the smowton/feature/ai-sparse-domains branch from 6b7b504 to d7b4b91 Compare July 11, 2018 15:54
@smowton smowton requested a review from a team as a code owner July 11, 2018 15:54
@tautschnig
Copy link
Collaborator

My hesitation remains: the code and interface are becoming more complicated, and that's in a way that is non-standard. If adopting the terminology of "cut points" some of this is remedied. With regard to @martin-cs' comment: yes, I acknowledge that there is a difference between number-of-domains-stored vs amount-of-storage-required-for-domains, but the net effect to the end user isn't actually different?

Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR failed Diffblue compatibility checks (cbmc commit: d7b4b91).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/78716828
Status will be re-evaluated on next push.
Please contact @peterschrammel, @thk123, or @allredj for support.

Common spurious failures:

  • the cbmc commit has disappeared in the mean time (e.g. in a force-push)
  • the author is not in the list of contributors (e.g. first-time contributors).

@smowton smowton force-pushed the smowton/feature/ai-sparse-domains branch from d7b4b91 to 82f11c9 Compare July 11, 2018 16:45
This allows users of the `ait` abstract interpretation framework to specify that only certain states are needed after
its fixpoint computation is finished. For example, perhaps a particular analysis, or a particular user of that analysis,
only needs to know the results at END_FUNCTION instructions. In this case when a state does not need to be retained, and
only has a single successor, that state will be altered in place and passed to its successor instruction using std::move,
permitting an analysis to optimise away much (potentially all) state copying.

In the best case (a long, straight-line function whose state grows with every instruction, such as a constant propagator
reading "int x = 1; int y = 2; int z = 3; ...") this reduces analysis space and time complexity from quadratic to linear.
@smowton smowton force-pushed the smowton/feature/ai-sparse-domains branch from 82f11c9 to a87ef74 Compare July 11, 2018 16:48
@smowton
Copy link
Contributor Author

smowton commented Jul 11, 2018

@peterschrammel @tautschnig @martin-cs updated this to (a) deal with the interprocedural edges correctly, and (b) include a reasonably thorough unit test. As structured it should be unable to break any existing users, since per default it retains all states.

@tautschnig I'm overapproximating what constitutes a cut-point, but I can add comments to that effect if you like? Re: sharing-map vs. this solution, this is just easier to implement, both for me and for the author / user of an ait-based analysis -- the space saving comes for free (just pass a predicate to ai_baset's constructor), and the time saving just requires implementing a moving version of merge, compared to using sharing-map which requires me to make sure my state data structures conform to sharing-map's constraints.

@peterschrammel I'm actually retaining state even at non-loop convergences (e.g. an if/else convergence) because I can't generally depend on instructions being visited in toplogical order due to backward GOTOs that aren't loop back-edges -- is there a formal way to describe that policy?

I was thinking of invariants that need to be true for this to work: I think I just need that
BOTTOM-MERGE-A == B-MERGE-A if B <= A, which I believe is a reasonable ask?

Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR failed Diffblue compatibility checks (cbmc commit: 82f11c9).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/78722174
Status will be re-evaluated on next push.
Please contact @peterschrammel, @thk123, or @allredj for support.

Common spurious failures:

  • the cbmc commit has disappeared in the mean time (e.g. in a force-push)
  • the author is not in the list of contributors (e.g. first-time contributors).

Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR failed Diffblue compatibility checks (cbmc commit: a87ef74).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/78722466
Status will be re-evaluated on next push.
Please contact @peterschrammel, @thk123, or @allredj for support.

Common spurious failures:

  • the cbmc commit has disappeared in the mean time (e.g. in a force-push)
  • the author is not in the list of contributors (e.g. first-time contributors).

@smowton
Copy link
Contributor Author

smowton commented Jul 12, 2018

(If @tautschnig and I have reached an impasse, @peterschrammel @martin-cs could you provide tie-breaking opinions?)

@peterschrammel
Copy link
Member

To break dependencies and avoid cluttering of existing interfaces, would it be possible to introduce your optimisation as an extra layer (ie extend rather than modify) so that existing analyses may derive from the interface that offers the optimisation or the original one?

@martin-cs
Copy link
Collaborator

martin-cs commented Jul 12, 2018 via email

@smowton
Copy link
Contributor Author

smowton commented Jul 13, 2018

@peterschrammel @martin-cs This functionality is already optional and default-off: the must-retain-state predicate is supplied as a constructor parameter that defaults to null (== retain all states, old behaviour). Implementing a move-merge operator on your domain is also optional (and orthogonal -- the combinations (move-merge, retain-all-states) and (copy-merge, retain-some-states) are both faster than the default (copy-merge, retain-all-states).

@smowton
Copy link
Contributor Author

smowton commented Jul 13, 2018

@peterschrammel I'm not convinced inheriting from ai_with_selective_domainst vs. ait would be any cleaner than supplying or not-supplying a constructor argument to ait? With inheritance we would either need to duplicate or provide a fairly ugly hook into the visit routine, and the inheriting analysis would need to provide some run-time customisation possibility anyhow if it wanted its users to be able to choose between storing all states and storing them selectively.

@martin-cs
Copy link
Collaborator

@smowton : I realise this is time critical / blocking for you but I have run out of time this weekend to work on this. I'm going to start skipping sessions tomorrow until I have some code for you to look at.

@martin-cs
Copy link
Collaborator

I had a chat with @smowton during the coffee break. I think we concluded:

  1. We will get together next week and see if we can merge effort and get things into trunk.
  2. In the mean time, @smowton will merge into the security scanner branch so it doesn't block them.

@martin-cs
Copy link
Collaborator

Thanks @peterschrammel this is on my immediate TODO anyway...

@smowton
Copy link
Contributor Author

smowton commented Mar 6, 2019

@martin-cs had any chance to work on history-sensitivity lately?

@martin-cs
Copy link
Collaborator

martin-cs commented Mar 21, 2019 via email

@TGWDB
Copy link
Contributor

TGWDB commented Jul 9, 2021

Closing this PR as it appears that no conclusion on how to progress/merge this work was both reached and followed through upon. If you believe this is erroneous please reopen and update.

@TGWDB TGWDB closed this Jul 9, 2021
@martin-cs
Copy link
Collaborator

Just for conclusion. The history patches did get merged and there is #2572 (comment) interface in the storage. To implement what @chrisr-diffblue originally asked for, one would simply need to:

  1. Implement prune to throw away the domain.
  2. Have abstract_state_before return a new TOP domain if anyone ever asks for a pruned location.
  3. Inherit from your chosen implementation of ai_baset and override ai_baset::visit_function_call so that it prunes locations when the function call is done.

If you prune every location apart from the last then you will get the biggest saving without loosing accuracy. If you prune every location with one inbound edge (apart from first and last) then you should avoid recomputing anything if it is called again. If you are feeling really really fancy then you can hook ai_baset::abstract_state_before and do on-the-fly (re-)computation of domains.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants