-
Notifications
You must be signed in to change notification settings - Fork 948
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Boltzman performance issue #2224
Comments
Thanks for writing this up and doing the benchmarks. Agreed we need to look into this. I would like to have most time and scheduling related stuff out of the way, and then take a critical look at how we can save our agents elegantly. I would like to clean up some time and scheduler things first, to see how those demand access to agents. |
Could you check how fast this is? import itertools
itertools.chain.from_iterable(model.agents_.values()) |
That’s terrible. Good to know that dict-to-list conversion is a no-go. |
In the past, user may "optimize" the data structure, by choosing |
We use a WeakKeyDictionary in AgentSet. There is no WeakKeyList. Moreover, we want a Dict for fast set like operations. So, no, I don't think that is going to work. It might be possible to have a dict with WeakKeyDictionaries. However, I am wondering how common this kind of operation will be and whether it is common enough to justify redesigning the data structure for it. My idea is rather to solve the problem inside the For the activation by type scenario, everything hinges on how fast a group_by operation is and whether you must redo this every step. For models where agents are not added or removed over time, you need to do this operation only once anyway so you don't have a bottleneck. Only if you have changing numbers of agents might it be beneficial to actively maintain a dict with agents by class. |
Birth and deaths should be quite common, but not ubiquitous enough to be the default, that it currently is. As such, I can accept having an explicit, but optional argument to the |
I made some progress on this. In short, I have done the following
Below you can see the results for a small benchmark. Orginal is the orignal model with the retrieval of the agents via As can be seen, the updated version is quite a bit better than the current implementation, but it still falls short of the pure list implementation. My suspicion is that this is because |
I have continued my testing and investigation. Below you see a comparison between the three options (original, modified handling of model.agents, pure list) out to 10k agents. The original is not entirely complete because I ran out of time but the story is clear. The updated version is much better than the original version and sees closer to the performance of a pure list implementation Now, let's look more closely at just the updated version and the pure list version (see below). We still see that the updated version scales faster than linear with the number of agents. Why? Let's work backward from the code in the Boltzman model. In the step method of the agent, this model randomly selects an agent from all agents in the model. Specifically, the agent uses So, where does this leave us? I am not sure. I see a few options and would love other people's thoughts (@rht, @EwoutH, @Corvince).
|
In #2251, I am making steady progress on making the described enhancements. Below, we see a new comparison. Original is now the version with the new implementation inside Model and using Below, I seperate out Updated and List to get a finer look at their difference which is now marginal. The difference here is solely due to the resolving of weakkeyrefs inside MoneyAgent. |
Looks awesome! I agree that the remaining performance gap is negligible now |
This PR is a performance enhancement for Model.agents. It emerged from a discussion on [the weird scaling performance of the Boltzman wealth model](#2224). # Key changes model.agents now returns the agentset as maintained by the model, rather than a new copy based on the hard references agent registration and deregistration have been moved from the Agent into the model. The agent now calls model.register and model.deregister. This encapsulates everything cleanly inside the model class and makes Agent less dependent on the inner details of how Model manages the hard references to agents the setup of the relevant datastructures is moved into its own helper method, again, this cleans up code.
Given that the issue is localized to be about step performance instead of also including agent register/deregister, I think so. I'm relieved that Mesa 3.0 is back to be (relatively) fast again. Thank you @quaquel ! |
This PR is a performance enhancement for Model.agents. It emerged from a discussion on [the weird scaling performance of the Boltzman wealth model](projectmesa#2224). model.agents now returns the agentset as maintained by the model, rather than a new copy based on the hard references agent registration and deregistration have been moved from the Agent into the model. The agent now calls model.register and model.deregister. This encapsulates everything cleanly inside the model class and makes Agent less dependent on the inner details of how Model manages the hard references to agents the setup of the relevant datastructures is moved into its own helper method, again, this cleans up code.
In mesa-frames, there has been a bit of discussion on the performance of the Boltzman Wealth model. I have investigated the issue and conclude that the problem is not with the AgentSet as used in the scheduler. Instead, the problem stems from how a user can access all agents in the model.
Below, we see the performance of 3 versions of the model. Original is the current example version of the model. List is a version of the model where the scheduler is replaced with a list, and in the Agent, this list is also used to select another agent with whom to swap wealth. Test is an intermediate version. The model uses a regular scheduler but the Agent uses the list.
Below are the key code snippets
As seen in the first plot, the original version scales poorly, but list and test seem very similar. So, below, we compare these in more detail up to 10k agents. As can be seen, using a list within the Agent solves virtually all performance loss. The small remaining difference in performance is the overhead of using AgentSet inside the scheduler instead of using a simple list. This is entirely due to the reliance on weak references in AgentSet.
So what does this mean? The way in which a user can get hold of all agents in the model needs to be refined. Currently, the model has a dictionary with classes as keys, and lists with hard references as values. Every time you do
model.agents
this is copied into a new AgentSet class. This creates a lot of overhead. For Mesa 3, I suggest having both a list with the hard references and actively maintaining an AgentSet with all agents. Model.agents can then always return this AgentSet, while the list with hard references can be a private attribute on the model.I also want to know if we need to maintain the dict with Agent classes as key and lists of agents as values. With AgentSet.groupby, this is redundant unless a user needs frequent and fast access to agents by class and the agents in the model change a lot from one tick to the next.
The text was updated successfully, but these errors were encountered: