Get normalized scores #110

rodrigodelazcano · 2023-07-11T16:20:08Z

Description

This PR adds the get_normalized_score() function from D4RL for policy testing and comparisson. This function is added as a utility instead of a method in every MinariDataset since we haven't decided if we want to keep this feature for all future datasets in Minari and not all offline RL tasks can by tested in this manner (due to complexity of generating an expert policy for a reference maximum score).

This function requires the dataset to have two attributes which can be set when creating the dataset, ref_min_score and ref_max_score. These attributes are optional and if not present in the dataset get_normalized_socre() will through an error.
To set these attributes they can be directly passed by the user when creating the dataset, or by passing a callable expert policy to obtain the episode returns from. Note that if ref_min_score is not passed, it will be internally computed with a random policy.

I'm using this PR to also fix an issue with storing dictionary observation spaces in the DataCollectorV0 and not resetting the environment when terminated or truncated. I added a comment in the specific line changes.

This PR is a Draft because I need to update the PointMaze dataset generation tutorial to include the reference scores. There are some issues and differences with the D4RL PointMaze reference scores that I'll list here.

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

Screenshots

Please attach before and after screenshots of the change if applicable.
To upload images to a PR -- simply drag and drop or copy paste.

Checklist:

I have run the pre-commit checks with pre-commit run --all-files (see CONTRIBUTING.md instructions to set it up)
I have run pytest -v and no errors are present.
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I solved any possible warnings that pytest -v has generated that are related to my code to the best of my knowledge.
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

rodrigodelazcano · 2023-07-11T16:22:21Z

minari/data_collector/data_collector.py

@@ -230,7 +230,12 @@ def step(
        # or truncation. This may happen if the step_data_callback truncates or terminates the episode under
        # certain conditions.
        if self._new_episode and not self._reset_called:
-            self._buffer[-1]["observations"] = [self._previous_eps_final_obs]
+            if isinstance(self._previous_eps_final_obs, dict):


Fix for dictionary observation spaces

If possible, I think it would be good to add a test for this behavior.

younik · 2023-07-11T22:39:52Z

minari/utils.py

@@ -197,6 +239,11 @@ def create_dataset_from_buffers(
        author (Optional[str], optional): author that generated the dataset. Defaults to None.
        author_email (Optional[str], optional): email of the author that generated the dataset. Defaults to None.
        code_permalink (Optional[str], optional): link to relevant code used to generate the dataset. Defaults to None.
+        ref_min_score( Optional[float], optional): minimum reference score from the average returns of a random policy. This value is later used to normalize a score with :meth:`minari.get_normalized_score`. If default None the value will be estimated with a default random policy.


small typo with spacing ref_min_score( Optional[float]-> ref_min_score (Optional[float]

younik · 2023-07-11T22:43:26Z

minari/utils.py

@@ -304,6 +376,11 @@ def create_dataset_from_collector_env(
        author (Optional[str], optional): author that generated the dataset. Defaults to None.
        author_email (Optional[str], optional): email of the author that generated the dataset. Defaults to None.
        code_permalink (Optional[str], optional): link to relevant code used to generate the dataset. Defaults to None.
+        ref_min_score( Optional[float], optional): minimum reference score from the average returns of a random policy. This value is later used to normalize a score with :meth:`minari.get_normalized_score`. If default None the value will be estimated with a default random policy.


spacing typo

younik · 2023-07-11T22:53:42Z

minari/utils.py

+        if expert_policy is not None or ref_max_score is not None:
+            env = copy.deepcopy(collector_env.env)
+            if ref_min_score is None:
+                ref_min_score = get_average_reference_score(
+                    env, RandomPolicy(env), num_episodes_average_score
+                )
+
+            if expert_policy is not None:
+                ref_max_score = get_average_reference_score(
+                    env, expert_policy, num_episodes_average_score
+                )


duplicate of 330-340, it can be factorized
Actually, I think we should try to refactor create_dataset_from_collector_env+ DataCollectorV0.save_to_diskand create_dataset_from_buffers, but we can do it in another PR

Sounds good, I'll update this section and then we can look into making other refactors

younik · 2023-07-11T23:06:54Z

minari/utils.py

+    with h5py.File(dataset.spec.data_path, "r") as f:
+        ref_min_score = f.attrs.get("ref_min_score", default=None)
+        ref_max_score = f.attrs.get("ref_max_score", default=None)
+    if ref_min_score is None or ref_max_score is None:
+        raise ValueError(
+            f"Reference score not provided for dataset {dataset.spec.dataset_id}. Can't compute the normalized score."


Maybe it is better to access those attributes through MinariStorage
Should we make the MinariDataset._data gettable?

The idea here is to avoid extra optional attributes in the MinariStorage and MinariDataset. What we can do is move this function to the MinariStorage (that uses h5py`), and then call this function throught another method in MinariDataset.

def get_normalized_score(self, score): return self._data.get_normalized_score(score)

Do you think this is a better option? In my opinion I prefer the current implementation to keep this feature as separated as possible from the dataset structures until we decide if we are going to maintain it at all.

Is it possible get_normalized_score will be totally removed from the codebase? I think it depends on what you mean by not maintaining it. If it's possible get_normalized_score will be totally removed, then I think the h5py is fine for now, but if get_normalized_scores is likely to stick around in any form in the long run, I think it should be a function of MinariStorage exactly as you described.

The idea here is to avoid extra optional attributes in the MinariStorage and MinariDataset. What we can do is move this function to the MinariStorage (that uses h5py`), and then call this function throught another method in MinariDataset.

def get_normalized_score(self, score): return self._data.get_normalized_score(score)

Do you think this is a better option? In my opinion I prefer the current implementation to keep this feature as separated as possible from the dataset structures until we decide if we are going to maintain it at all.

I prefer the current implementation. I was also thinking on a general method get_attr to get some attribute from the data (and throw an error if there is no such attribute)

I like the idea of get_attr can you add this in a separe PR @younik?

Howuhh · 2023-07-12T14:25:19Z

minari/utils.py

@@ -197,6 +239,11 @@ def create_dataset_from_buffers(
        author (Optional[str], optional): author that generated the dataset. Defaults to None.
        author_email (Optional[str], optional): email of the author that generated the dataset. Defaults to None.
        code_permalink (Optional[str], optional): link to relevant code used to generate the dataset. Defaults to None.
+        ref_min_score( Optional[float], optional): minimum reference score from the average returns of a random policy. This value is later used to normalize a score with :meth:`minari.get_normalized_score`. If default None the value will be estimated with a default random policy.


should we even compute min score if max score is not provided? we won't be able to use it anyway

This is already being done. First checking if an expert_policy or ref_max_score is passed and then computing the respective attributes. I'll update the docs, I agree it's not very clear that functionality

balisujohn

Are we committing to at least leaving a version of get_normalized_scores in the codebase? If so, I think it would be good to write some sort of test for it. If not, then I think it's fine to hold off on tests for now.

balisujohn · 2023-07-13T17:13:03Z

minari/data_collector/data_collector.py

@@ -230,7 +230,12 @@ def step(
        # or truncation. This may happen if the step_data_callback truncates or terminates the episode under
        # certain conditions.
        if self._new_episode and not self._reset_called:
-            self._buffer[-1]["observations"] = [self._previous_eps_final_obs]
+            if isinstance(self._previous_eps_final_obs, dict):


If possible, I think it would be good to add a test for this behavior.

rodrigodelazcano · 2023-07-13T17:38:22Z

Are we committing to at least leaving a version of get_normalized_scores in the codebase? If so, I think it would be good to write some sort of test for it. If not, then I think it's fine to hold off on tests for now.

We should be keeping it in all future codebase, at least for some of the d4rl datasets. I'll add the tests :)

rodrigodelazcano · 2023-07-17T20:40:51Z

I added a test for

Are we committing to at least leaving a version of get_normalized_scores in the codebase? If so, I think it would be good to write some sort of test for it. If not, then I think it's fine to hold off on tests for now.

I added a test for this fix #110 (comment). We can make the test for get_normlized_score in another PR

balisujohn · 2023-07-17T21:48:04Z

LGTM :^)

rodrigodelazcano added 7 commits July 9, 2023 22:11

add ref scores to create

4d16bc6

finish create datasets score

2d025ac

get normalized score

8981a6c

get_normalized_score

490130d

ref scores create collector env

f5ceafc

assertion ref max score and expert policy

57a11be

pre-commit

64f2cb9

rodrigodelazcano requested review from younik and balisujohn July 11, 2023 16:20

rodrigodelazcano commented Jul 11, 2023

View reviewed changes

younik approved these changes Jul 11, 2023

View reviewed changes

Howuhh reviewed Jul 12, 2023

View reviewed changes

docstring update

2d33758

balisujohn reviewed Jul 13, 2023

View reviewed changes

rodrigodelazcano added 3 commits July 16, 2023 21:50

fix typo

24934be

carry on previous obs test

ffc5342

fix test

9101ead

rodrigodelazcano marked this pull request as ready for review July 17, 2023 20:40

balisujohn merged commit cebc46c into Farama-Foundation:main Jul 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get normalized scores #110

Get normalized scores #110

rodrigodelazcano commented Jul 11, 2023 •

edited

Loading

rodrigodelazcano Jul 11, 2023

balisujohn Jul 13, 2023

younik Jul 11, 2023

younik Jul 11, 2023

younik Jul 11, 2023

rodrigodelazcano Jul 13, 2023

younik Jul 11, 2023

rodrigodelazcano Jul 13, 2023 •

edited

Loading

balisujohn Jul 13, 2023

younik Jul 13, 2023

rodrigodelazcano Jul 17, 2023

Howuhh Jul 12, 2023

rodrigodelazcano Jul 13, 2023

balisujohn left a comment

balisujohn Jul 13, 2023

rodrigodelazcano commented Jul 13, 2023

rodrigodelazcano commented Jul 17, 2023

balisujohn commented Jul 17, 2023

Get normalized scores #110

Get normalized scores #110

Conversation

rodrigodelazcano commented Jul 11, 2023 • edited Loading

Description

Type of change

Screenshots

Checklist:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rodrigodelazcano Jul 13, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

balisujohn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rodrigodelazcano commented Jul 13, 2023

rodrigodelazcano commented Jul 17, 2023

balisujohn commented Jul 17, 2023

rodrigodelazcano commented Jul 11, 2023 •

edited

Loading

rodrigodelazcano Jul 13, 2023 •

edited

Loading