Sending a Map across a process may degrade hashing performance #283

yorickpeterse · 2020-09-03T23:54:29Z

Inko's Map implementation caches the hash value of its pairs, and reuses these
hash codes in several places. When the keys of a Map are regular objects (e.g. a
custom defined User object), the hash code is derived from the object's memory
address.

When a process A sends a map M to process B, all data is copied into B's heap.
As part of this process, the addresses of the copied objects change. Because the
addresses change, so will the hash values.

This may degrade performance of Map objects sent across processes, as the cached
hash values will no longer match the true hash values.

This may not be a problem in practise though. Hash collisions would require
process B to at some point reuse the memory (that once contained the source of
the copied object) released by process A. But when this does happen, there may
be an increase in hash collisions.

The cached hash value is used in two places:

Map.insert_pair
Map.rehash_pair

The method Map.insert_pair is only used when adding new keys, so we can pass
the hash value as an additional argument. Map.rehash_pair is used for
rehashing every pair in the Map. Calculating hash values again may end up being
expensive, especially for large maps.

One possible solution is to store the owning process in a Map. When rehashing,
we check if this equals the current process. If not, we first recompute all hash
values, then update this field to point to the current process. For this to
work, we'd have to optimise std::process.current to not allocate a new process
object on every call; instead it should use the same object within the same
process. Without this optimisation, the check to perform (cached process == current process) would be too expensive.

With that said, this solution is very specific to Map objects. I can see there
being more cases where copying an object may require recomputing/invalidating
some data. We could introduce some sort of trait (e.g. AfterCopy) that
provides a method, called when receiving a message. This method can then be used
to do whatever is necessary. This however will introduce additional overhead for
every message received, and many (if not almost all) will not make use of this
hook; meaning the overhead is a waste.

I'm currently not sure what the best (and least complex/over-engineered)
solution to this problem is.

The text was updated successfully, but these errors were encountered:

yorickpeterse · 2020-09-03T23:59:39Z

There may be another option. Issue
https://gitlab.com/inko-lang/inko/-/issues/201 discusses changing the memory
layout for objects. I've also been toying with the idea of adding classes at the
VM level, so we can get rid of prototypes/inheritance entirely. Assuming for a
moment this is implemented, we could use the pointer to the class as the hash
source; not the pointer to the instance. This would increase the possibility
of collisions (e.g. multiple instances of the same User type would produce the
same hash value), but decouple the hash codes from process-local memory.

We could also just leave the code as-is. The likelihood of collisions in the
current setup may be so low it's just not worth the effort to change things.

yorickpeterse · 2021-03-26T18:48:35Z

There's a second problem related to this: the current copying logic doesn't ensure the same source object when copied multiple times produces the same output object. That is, if copying encounters A twice, it produces A₁ and A₂ as copies; instead of producing A₁ twice. This then breaks the == test for objects that rely on pointer equality.

yorickpeterse · 2021-07-28T23:31:50Z

In https://gitlab.com/inko-lang/inko/-/merge_requests/120 we're moving to an ownership model and the use of move semantics. Sending messages no longer involves copying, instead ownership of the data is transferred to the receiving process.

yorickpeterse closed this as completed Mar 15, 2023

yorickpeterse added bug Defects, unintended behaviour, etc performance Changes related to improving performance and removed type::bug labels Mar 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sending a Map across a process may degrade hashing performance #283

Sending a Map across a process may degrade hashing performance #283

yorickpeterse commented Sep 3, 2020

yorickpeterse commented Sep 3, 2020

yorickpeterse commented Mar 26, 2021

yorickpeterse commented Jul 28, 2021

Sending a Map across a process may degrade hashing performance #283

Sending a Map across a process may degrade hashing performance #283

Comments

yorickpeterse commented Sep 3, 2020

yorickpeterse commented Sep 3, 2020

yorickpeterse commented Mar 26, 2021

yorickpeterse commented Jul 28, 2021