Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WeakKeyDictionary slow in large-scale Agent Initialization #2232

Closed
EwoutH opened this issue Aug 20, 2024 · 8 comments
Closed

WeakKeyDictionary slow in large-scale Agent Initialization #2232

EwoutH opened this issue Aug 20, 2024 · 8 comments

Comments

@EwoutH
Copy link
Member

EwoutH commented Aug 20, 2024

In a model where 10.000 agents are initialized, the majority of the runtime, about 20 seconds, is spend in the update() method of the WeakKeyDictionary, which is also called about 10.000 times.

I think it could be done a lot faster the dict wasn't updated on each Agent creating, but once all agents were created.

Maybe related to #2221.

class WeakKeyDictionary(_collections_abc.MutableMapping):
    """ Mapping class that references keys weakly.

    Entries in the dictionary will be discarded when there is no
    longer a strong reference to the key. This can be used to
    associate additional data with an object owned by other parts of
    an application without adding attributes to those objects. This
    can be especially useful with objects that override attribute
    accesses.
    """

    def __init__(self, dict=None):
        self.data = {}
        def remove(k, selfref=ref(self)):
            self = selfref()
            if self is not None:
                if self._iterating:
                    self._pending_removals.append(k)
                else:
                    try:
                        del self.data[k]
                    except KeyError:
                        pass
        self._remove = remove
        # A list of dead weakrefs (keys to be removed)
        self._pending_removals = []
        self._iterating = set()
        self._dirty_len = False
        if dict is not None:
            self.update(dict)   # Calling this function for each agent takes up the majority of the run time.


    def update(self, dict=None, /, **kwargs):
        d = self.data
        if dict is not None:
            if not hasattr(dict, "items"):
                dict = type({})(dict)
            for key, value in dict.items():
                d[ref(key, self._remove)] = value
        if len(kwargs):
            self.update(kwargs)

Edit: A model run of a quite complex GIS / network model, running with 25000 agents and vehicle simulations. The WeakKeyDictionary initialization takes up 61.1% of runtime, all during Agent initialisation.

image

@quaquel
Copy link
Member

quaquel commented Aug 20, 2024

What is your exact testcode? Model.agents creates an agentset but does not actively maintain one. So not sure what would explain your results in a realistic test scenario.

@EwoutH
Copy link
Member Author

EwoutH commented Aug 20, 2024

Noticed it at https://github.com/EwoutH/urban-self-driving-effects/tree/main/model, change n_agents=100.

I will try to create a minimal reproducible example.

For now I'm just deleting the whole WeakRefDict, I don't add or remove agents after init so I don't need it.

@EwoutH
Copy link
Member Author

EwoutH commented Aug 20, 2024

Replacing the WeakRefDict with a regular dict sped the process up by about 10 to 15x

image

@quaquel
Copy link
Member

quaquel commented Aug 20, 2024

 self.agents.add(Traveler(i, self, locations[i], gdf["65x65 Nummer"][locations[i]]))

Why are you doing this? Just create the agents. They add themselves to the model. This is tied to #2224. Each model.agents call creates a new agent set.

@EwoutH
Copy link
Member Author

EwoutH commented Aug 20, 2024

Good catch. This helps a lot.

Proves you can spend as much time you want on a library and still make silly mistakes (or at least I can)

@quaquel
Copy link
Member

quaquel commented Aug 20, 2024

True, but it also shows how silly the behavior identified in #2224 is. We probably need to do some kind of lazy creation of the AgentSet. So in Model we track whether agents have been added, if so a flag is set to true and if model.agents is called a new AgentSet is created and the flag is reset. If no agents have been added, the old agentset can be returned. Since we use weakrefs, there is no need to track agent removal. This minimizes overhead (it would not have prevented your issue but will fix #2224).

@EwoutH
Copy link
Member Author

EwoutH commented Aug 20, 2024

It also went wrong right from the very first commit EwoutH/urban-self-driving-effects@c395ec2

We probably need to do some kind of lazy creation of the AgentSet.

Yeah agree. There's still something to be won there.

Thanks for catching that error, much appreciated!

@EwoutH EwoutH closed this as not planned Won't fix, can't repro, duplicate, stale Aug 20, 2024
EwoutH added a commit to EwoutH/urban-self-driving-effects that referenced this issue Aug 20, 2024
This isn't necessary, because on Agent creating that's handled automatically. The performance is now fine and the model runs properly with up to a million agents.

Thanks for catching it Jan!

Ref: projectmesa/mesa#2232
Co-Authored-By: Jan Kwakkel <[email protected]>
@Corvince
Copy link
Contributor

Good catch. This helps a lot.

Proves you can spend as much time you want on a library and still make silly mistakes (or at least I can)

In your defense, it was necessary for the schedulers to iteratively add agents before AgentSet was introduced.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants