Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get normalized scores #110

Merged
merged 11 commits into from
Jul 17, 2023
Merged

Conversation

rodrigodelazcano
Copy link
Member

@rodrigodelazcano rodrigodelazcano commented Jul 11, 2023

Description

This PR adds the get_normalized_score() function from D4RL for policy testing and comparisson. This function is added as a utility instead of a method in every MinariDataset since we haven't decided if we want to keep this feature for all future datasets in Minari and not all offline RL tasks can by tested in this manner (due to complexity of generating an expert policy for a reference maximum score).

This function requires the dataset to have two attributes which can be set when creating the dataset, ref_min_score and ref_max_score. These attributes are optional and if not present in the dataset get_normalized_socre() will through an error.
To set these attributes they can be directly passed by the user when creating the dataset, or by passing a callable expert policy to obtain the episode returns from. Note that if ref_min_score is not passed, it will be internally computed with a random policy.

I'm using this PR to also fix an issue with storing dictionary observation spaces in the DataCollectorV0 and not resetting the environment when terminated or truncated. I added a comment in the specific line changes.

This PR is a Draft because I need to update the PointMaze dataset generation tutorial to include the reference scores. There are some issues and differences with the D4RL PointMaze reference scores that I'll list here.

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Screenshots

Please attach before and after screenshots of the change if applicable.
To upload images to a PR -- simply drag and drop or copy paste.

Checklist:

  • I have run the pre-commit checks with pre-commit run --all-files (see CONTRIBUTING.md instructions to set it up)
  • I have run pytest -v and no errors are present.
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I solved any possible warnings that pytest -v has generated that are related to my code to the best of my knowledge.
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@@ -230,7 +230,12 @@ def step(
# or truncation. This may happen if the step_data_callback truncates or terminates the episode under
# certain conditions.
if self._new_episode and not self._reset_called:
self._buffer[-1]["observations"] = [self._previous_eps_final_obs]
if isinstance(self._previous_eps_final_obs, dict):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix for dictionary observation spaces

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, I think it would be good to add a test for this behavior.

minari/utils.py Outdated
@@ -197,6 +239,11 @@ def create_dataset_from_buffers(
author (Optional[str], optional): author that generated the dataset. Defaults to None.
author_email (Optional[str], optional): email of the author that generated the dataset. Defaults to None.
code_permalink (Optional[str], optional): link to relevant code used to generate the dataset. Defaults to None.
ref_min_score( Optional[float], optional): minimum reference score from the average returns of a random policy. This value is later used to normalize a score with :meth:`minari.get_normalized_score`. If default None the value will be estimated with a default random policy.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small typo with spacing ref_min_score( Optional[float]-> ref_min_score (Optional[float]

@@ -304,6 +376,11 @@ def create_dataset_from_collector_env(
author (Optional[str], optional): author that generated the dataset. Defaults to None.
author_email (Optional[str], optional): email of the author that generated the dataset. Defaults to None.
code_permalink (Optional[str], optional): link to relevant code used to generate the dataset. Defaults to None.
ref_min_score( Optional[float], optional): minimum reference score from the average returns of a random policy. This value is later used to normalize a score with :meth:`minari.get_normalized_score`. If default None the value will be estimated with a default random policy.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spacing typo

Comment on lines +425 to +435
if expert_policy is not None or ref_max_score is not None:
env = copy.deepcopy(collector_env.env)
if ref_min_score is None:
ref_min_score = get_average_reference_score(
env, RandomPolicy(env), num_episodes_average_score
)

if expert_policy is not None:
ref_max_score = get_average_reference_score(
env, expert_policy, num_episodes_average_score
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicate of 330-340, it can be factorized
Actually, I think we should try to refactor create_dataset_from_collector_env+ DataCollectorV0.save_to_diskand create_dataset_from_buffers, but we can do it in another PR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I'll update this section and then we can look into making other refactors

Comment on lines +478 to +483
with h5py.File(dataset.spec.data_path, "r") as f:
ref_min_score = f.attrs.get("ref_min_score", default=None)
ref_max_score = f.attrs.get("ref_max_score", default=None)
if ref_min_score is None or ref_max_score is None:
raise ValueError(
f"Reference score not provided for dataset {dataset.spec.dataset_id}. Can't compute the normalized score."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it is better to access those attributes through MinariStorage
Should we make the MinariDataset._data gettable?

Copy link
Member Author

@rodrigodelazcano rodrigodelazcano Jul 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea here is to avoid extra optional attributes in the MinariStorage and MinariDataset. What we can do is move this function to the MinariStorage (that uses h5py`), and then call this function throught another method in MinariDataset.

def get_normalized_score(self, score):
    return self._data.get_normalized_score(score)

Do you think this is a better option? In my opinion I prefer the current implementation to keep this feature as separated as possible from the dataset structures until we decide if we are going to maintain it at all.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible get_normalized_score will be totally removed from the codebase? I think it depends on what you mean by not maintaining it. If it's possible get_normalized_score will be totally removed, then I think the h5py is fine for now, but if get_normalized_scores is likely to stick around in any form in the long run, I think it should be a function of MinariStorage exactly as you described.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea here is to avoid extra optional attributes in the MinariStorage and MinariDataset. What we can do is move this function to the MinariStorage (that uses h5py`), and then call this function throught another method in MinariDataset.

def get_normalized_score(self, score):
    return self._data.get_normalized_score(score)

Do you think this is a better option? In my opinion I prefer the current implementation to keep this feature as separated as possible from the dataset structures until we decide if we are going to maintain it at all.

I prefer the current implementation. I was also thinking on a general method get_attr to get some attribute from the data (and throw an error if there is no such attribute)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of get_attr can you add this in a separe PR @younik?

minari/utils.py Outdated
@@ -197,6 +239,11 @@ def create_dataset_from_buffers(
author (Optional[str], optional): author that generated the dataset. Defaults to None.
author_email (Optional[str], optional): email of the author that generated the dataset. Defaults to None.
code_permalink (Optional[str], optional): link to relevant code used to generate the dataset. Defaults to None.
ref_min_score( Optional[float], optional): minimum reference score from the average returns of a random policy. This value is later used to normalize a score with :meth:`minari.get_normalized_score`. If default None the value will be estimated with a default random policy.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we even compute min score if max score is not provided? we won't be able to use it anyway

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already being done. First checking if an expert_policy or ref_max_score is passed and then computing the respective attributes. I'll update the docs, I agree it's not very clear that functionality

Copy link
Collaborator

@balisujohn balisujohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we committing to at least leaving a version of get_normalized_scores in the codebase? If so, I think it would be good to write some sort of test for it. If not, then I think it's fine to hold off on tests for now.

@@ -230,7 +230,12 @@ def step(
# or truncation. This may happen if the step_data_callback truncates or terminates the episode under
# certain conditions.
if self._new_episode and not self._reset_called:
self._buffer[-1]["observations"] = [self._previous_eps_final_obs]
if isinstance(self._previous_eps_final_obs, dict):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, I think it would be good to add a test for this behavior.

@rodrigodelazcano
Copy link
Member Author

Are we committing to at least leaving a version of get_normalized_scores in the codebase? If so, I think it would be good to write some sort of test for it. If not, then I think it's fine to hold off on tests for now.

We should be keeping it in all future codebase, at least for some of the d4rl datasets. I'll add the tests :)

@rodrigodelazcano
Copy link
Member Author

I added a test for

Are we committing to at least leaving a version of get_normalized_scores in the codebase? If so, I think it would be good to write some sort of test for it. If not, then I think it's fine to hold off on tests for now.

I added a test for this fix #110 (comment). We can make the test for get_normlized_score in another PR

@rodrigodelazcano rodrigodelazcano marked this pull request as ready for review July 17, 2023 20:40
@balisujohn
Copy link
Collaborator

LGTM :^)

@balisujohn balisujohn merged commit cebc46c into Farama-Foundation:main Jul 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants