Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sending a Map across a process may degrade hashing performance #283

Closed
yorickpeterse opened this issue Sep 3, 2020 · 3 comments
Closed
Labels
bug Defects, unintended behaviour, etc performance Changes related to improving performance std Changes related to the standard library

Comments

@yorickpeterse
Copy link
Collaborator

Inko's Map implementation caches the hash value of its pairs, and reuses these
hash codes in several places. When the keys of a Map are regular objects (e.g. a
custom defined User object), the hash code is derived from the object's memory
address.

When a process A sends a map M to process B, all data is copied into B's heap.
As part of this process, the addresses of the copied objects change. Because the
addresses change, so will the hash values.

This may degrade performance of Map objects sent across processes, as the cached
hash values will no longer match the true hash values.

This may not be a problem in practise though. Hash collisions would require
process B to at some point reuse the memory (that once contained the source of
the copied object) released by process A. But when this does happen, there may
be an increase in hash collisions.

The cached hash value is used in two places:

  • Map.insert_pair
  • Map.rehash_pair

The method Map.insert_pair is only used when adding new keys, so we can pass
the hash value as an additional argument. Map.rehash_pair is used for
rehashing every pair in the Map. Calculating hash values again may end up being
expensive, especially for large maps.

One possible solution is to store the owning process in a Map. When rehashing,
we check if this equals the current process. If not, we first recompute all hash
values, then update this field to point to the current process. For this to
work, we'd have to optimise std::process.current to not allocate a new process
object on every call; instead it should use the same object within the same
process. Without this optimisation, the check to perform (cached process == current process) would be too expensive.

With that said, this solution is very specific to Map objects. I can see there
being more cases where copying an object may require recomputing/invalidating
some data. We could introduce some sort of trait (e.g. AfterCopy) that
provides a method, called when receiving a message. This method can then be used
to do whatever is necessary. This however will introduce additional overhead for
every message received, and many (if not almost all) will not make use of this
hook; meaning the overhead is a waste.

I'm currently not sure what the best (and least complex/over-engineered)
solution to this problem is.

@yorickpeterse
Copy link
Collaborator Author

There may be another option. Issue
https://gitlab.com/inko-lang/inko/-/issues/201 discusses changing the memory
layout for objects. I've also been toying with the idea of adding classes at the
VM level, so we can get rid of prototypes/inheritance entirely. Assuming for a
moment this is implemented, we could use the pointer to the class as the hash
source; not the pointer to the instance. This would increase the possibility
of collisions (e.g. multiple instances of the same User type would produce the
same hash value), but decouple the hash codes from process-local memory.

We could also just leave the code as-is. The likelihood of collisions in the
current setup may be so low it's just not worth the effort to change things.

@yorickpeterse
Copy link
Collaborator Author

There's a second problem related to this: the current copying logic doesn't ensure the same source object when copied multiple times produces the same output object. That is, if copying encounters A twice, it produces A₁ and A₂ as copies; instead of producing A₁ twice. This then breaks the == test for objects that rely on pointer equality.

@yorickpeterse
Copy link
Collaborator Author

In https://gitlab.com/inko-lang/inko/-/merge_requests/120 we're moving to an ownership model and the use of move semantics. Sending messages no longer involves copying, instead ownership of the data is transferred to the receiving process.

@yorickpeterse yorickpeterse added bug Defects, unintended behaviour, etc performance Changes related to improving performance and removed type::bug labels Mar 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Defects, unintended behaviour, etc performance Changes related to improving performance std Changes related to the standard library
Projects
None yet
Development

No branches or pull requests

1 participant