-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Major environment refactoring (base version) #169
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
demand_distribution: Union[ | ||
int, float, type, Callable | ||
] = Uniform, | ||
vehicle_capacity: float = 1.0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think vehicle_capacity
and capacity
should be the same parameter, is it right?
EDIT: my bad - capacity is actually the "maximum capacity" to normalize
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, maybe later we want to rename the capacity
. I think now this may be confusing to some users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, let's leave it as simple as possible for now
log = get_pylogger(__name__) | ||
|
||
|
||
class MPDPEnv(RL4COEnvBase): | ||
"""Multi-agent Pickup and Delivery Problem environment. | ||
"""Multi-agent Pickup and Delivery Problem (mPDP) environment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that actually this should be called "mPDTSP", but I guess we can leave it like that for now
self.depot_sampler = get_sampler("depot", depot_distribution, min_loc, max_loc, **kwargs) | ||
|
||
# Prize distribution | ||
self.deterministic_prize_sampler = get_sampler("deterministric_prize", "uniform", 0.0, 4.0/self.num_loc, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one cannot be changed via kwargs, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, since there is a specific rule for the value range of the deterministic_prize
and also the following stochastic_prize
. Here I hardcode the distribution to Uniform
.
Description
Please refer to the full environment refactor PR: #166.
Motivation and Context
Please refer to the full environment refactor PR: #166.
Types of changes
Compare with the full environment refactor, this base refactor only touches:
And as a base refactor, this PR keeps:
Overall, this base version PR only works on moving code and doesn't change the logic for consequence. In the future version, we will refactor environments with the guide of the full refactor version step by step.
For more details, please refer to the full environment refactor PR: #166.
Checklist