-
-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get normalized scores #110
Get normalized scores #110
Conversation
@@ -230,7 +230,12 @@ def step( | |||
# or truncation. This may happen if the step_data_callback truncates or terminates the episode under | |||
# certain conditions. | |||
if self._new_episode and not self._reset_called: | |||
self._buffer[-1]["observations"] = [self._previous_eps_final_obs] | |||
if isinstance(self._previous_eps_final_obs, dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix for dictionary observation spaces
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If possible, I think it would be good to add a test for this behavior.
minari/utils.py
Outdated
@@ -197,6 +239,11 @@ def create_dataset_from_buffers( | |||
author (Optional[str], optional): author that generated the dataset. Defaults to None. | |||
author_email (Optional[str], optional): email of the author that generated the dataset. Defaults to None. | |||
code_permalink (Optional[str], optional): link to relevant code used to generate the dataset. Defaults to None. | |||
ref_min_score( Optional[float], optional): minimum reference score from the average returns of a random policy. This value is later used to normalize a score with :meth:`minari.get_normalized_score`. If default None the value will be estimated with a default random policy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small typo with spacing ref_min_score( Optional[float]
-> ref_min_score (Optional[float]
@@ -304,6 +376,11 @@ def create_dataset_from_collector_env( | |||
author (Optional[str], optional): author that generated the dataset. Defaults to None. | |||
author_email (Optional[str], optional): email of the author that generated the dataset. Defaults to None. | |||
code_permalink (Optional[str], optional): link to relevant code used to generate the dataset. Defaults to None. | |||
ref_min_score( Optional[float], optional): minimum reference score from the average returns of a random policy. This value is later used to normalize a score with :meth:`minari.get_normalized_score`. If default None the value will be estimated with a default random policy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spacing typo
if expert_policy is not None or ref_max_score is not None: | ||
env = copy.deepcopy(collector_env.env) | ||
if ref_min_score is None: | ||
ref_min_score = get_average_reference_score( | ||
env, RandomPolicy(env), num_episodes_average_score | ||
) | ||
|
||
if expert_policy is not None: | ||
ref_max_score = get_average_reference_score( | ||
env, expert_policy, num_episodes_average_score | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duplicate of 330-340, it can be factorized
Actually, I think we should try to refactor create_dataset_from_collector_env
+ DataCollectorV0.save_to_disk
and create_dataset_from_buffers
, but we can do it in another PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, I'll update this section and then we can look into making other refactors
with h5py.File(dataset.spec.data_path, "r") as f: | ||
ref_min_score = f.attrs.get("ref_min_score", default=None) | ||
ref_max_score = f.attrs.get("ref_max_score", default=None) | ||
if ref_min_score is None or ref_max_score is None: | ||
raise ValueError( | ||
f"Reference score not provided for dataset {dataset.spec.dataset_id}. Can't compute the normalized score." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it is better to access those attributes through MinariStorage
Should we make the MinariDataset._data
gettable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea here is to avoid extra optional attributes in the MinariStorage and MinariDataset. What we can do is move this function to the MinariStorage (that uses h5py`), and then call this function throught another method in MinariDataset.
def get_normalized_score(self, score):
return self._data.get_normalized_score(score)
Do you think this is a better option? In my opinion I prefer the current implementation to keep this feature as separated as possible from the dataset structures until we decide if we are going to maintain it at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible get_normalized_score
will be totally removed from the codebase? I think it depends on what you mean by not maintaining it. If it's possible get_normalized_score
will be totally removed, then I think the h5py is fine for now, but if get_normalized_scores
is likely to stick around in any form in the long run, I think it should be a function of MinariStorage
exactly as you described.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea here is to avoid extra optional attributes in the MinariStorage and MinariDataset. What we can do is move this function to the MinariStorage (that uses h5py`), and then call this function throught another method in MinariDataset.
def get_normalized_score(self, score): return self._data.get_normalized_score(score)Do you think this is a better option? In my opinion I prefer the current implementation to keep this feature as separated as possible from the dataset structures until we decide if we are going to maintain it at all.
I prefer the current implementation. I was also thinking on a general method get_attr
to get some attribute from the data (and throw an error if there is no such attribute)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of get_attr
can you add this in a separe PR @younik?
minari/utils.py
Outdated
@@ -197,6 +239,11 @@ def create_dataset_from_buffers( | |||
author (Optional[str], optional): author that generated the dataset. Defaults to None. | |||
author_email (Optional[str], optional): email of the author that generated the dataset. Defaults to None. | |||
code_permalink (Optional[str], optional): link to relevant code used to generate the dataset. Defaults to None. | |||
ref_min_score( Optional[float], optional): minimum reference score from the average returns of a random policy. This value is later used to normalize a score with :meth:`minari.get_normalized_score`. If default None the value will be estimated with a default random policy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we even compute min score if max score is not provided? we won't be able to use it anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already being done. First checking if an expert_policy
or ref_max_score
is passed and then computing the respective attributes. I'll update the docs, I agree it's not very clear that functionality
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we committing to at least leaving a version of get_normalized_scores
in the codebase? If so, I think it would be good to write some sort of test for it. If not, then I think it's fine to hold off on tests for now.
@@ -230,7 +230,12 @@ def step( | |||
# or truncation. This may happen if the step_data_callback truncates or terminates the episode under | |||
# certain conditions. | |||
if self._new_episode and not self._reset_called: | |||
self._buffer[-1]["observations"] = [self._previous_eps_final_obs] | |||
if isinstance(self._previous_eps_final_obs, dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If possible, I think it would be good to add a test for this behavior.
We should be keeping it in all future codebase, at least for some of the d4rl datasets. I'll add the tests :) |
I added a test for
I added a test for this fix #110 (comment). We can make the test for |
LGTM :^) |
Description
This PR adds the
get_normalized_score()
function from D4RL for policy testing and comparisson. This function is added as a utility instead of a method in every MinariDataset since we haven't decided if we want to keep this feature for all future datasets in Minari and not all offline RL tasks can by tested in this manner (due to complexity of generating an expert policy for a reference maximum score).This function requires the dataset to have two attributes which can be set when creating the dataset,
ref_min_score
andref_max_score
. These attributes are optional and if not present in the datasetget_normalized_socre()
will through an error.To set these attributes they can be directly passed by the user when creating the dataset, or by passing a callable expert policy to obtain the episode returns from. Note that if
ref_min_score
is not passed, it will be internally computed with a random policy.I'm using this PR to also fix an issue with storing dictionary observation spaces in the DataCollectorV0 and not resetting the environment when
terminated
ortruncated
. I added a comment in the specific line changes.This PR is a Draft because I need to update the PointMaze dataset generation tutorial to include the reference scores. There are some issues and differences with the D4RL PointMaze reference scores that I'll list here.
Type of change
Screenshots
Checklist:
pre-commit
checks withpre-commit run --all-files
(seeCONTRIBUTING.md
instructions to set it up)pytest -v
and no errors are present.pytest -v
has generated that are related to my code to the best of my knowledge.