Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document default trainer metrics #1914

Merged
merged 11 commits into from
Mar 2, 2024

Conversation

jdilger
Copy link
Contributor

@jdilger jdilger commented Feb 27, 2024

Expanded description of averaging reduction method for Segmentation metrics. Let me know if you'd like me to change anything, or if I should add the details on macro accuracy.

Closes #1874

@github-actions github-actions bot added the trainers PyTorch Lightning trainers label Feb 27, 2024
Copy link
Collaborator

@adamjstewart adamjstewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fantastic, maybe @robmarkcole can review?

Should we do this for all of our LightningModules? Basically just document the default metrics we use.

@adamjstewart adamjstewart changed the title Documentation/oa aa Document default trainer metrics Feb 28, 2024
@adamjstewart adamjstewart added this to the 0.5.2 milestone Feb 28, 2024
@robmarkcole
Copy link
Contributor

I feel this is a particularly nuanced issue so have expanded slightly:

"""
Sets up segmentation accuracy metrics:

- Multiclass Pixel Accuracy: Ratio of correctly classified pixels.
- Multiclass Jaccard Index (IoU): Per-pixel overlap between predicted and
  actual segments.

Uses 'micro' averaging, aggregating pixel predictions across classes. Each
pixel is weighted equally, favoring classes with more pixels. This may skew
performance metrics towards majority classes due to class imbalance.

Note:
- 'Micro' averaging suits overall performance evaluation but may not reflect
  minority class accuracy.
- 'Macro' averaging, not used here, gives equal weight to each class, useful
  for balanced performance assessment across imbalanced classes.
"""

@adamjstewart
Copy link
Collaborator

@jdilger do you think you can produce a similar level of detailed descriptions for the accuracy metrics used in other trainers? If not maybe @robmarkcole and I can help.

@jdilger
Copy link
Contributor Author

jdilger commented Feb 28, 2024

Thanks @robmarkcole, the more nuanced version looks great. @adamjstewart Sure thing. I can take a first pass at it.

@adamjstewart
Copy link
Collaborator

We're likely going to release v0.5.2 tomorrow or Saturday, so if you can add these today that would be great. If not, we may merge this anyway and then we can add more details for more trainers in a follow-up PR. Thanks again for your help!

@robmarkcole
Copy link
Contributor

LGTM. Only a minor suggestion for consistency

Sets up segmentation accuracy metrics >> Initialize the metrics

@jdilger
Copy link
Contributor Author

jdilger commented Feb 29, 2024

I have time to work on this today, but I went ahead and pushed the suggested change from robmarkcole.

I noticed that the metrics setup for the classification and regression trainers are set up with dicts while segmentation and detection use lists. I'm not sure it needs to be addressed now, but defining some standard names might be helpful. For instance, in the segmentation trainer it's using [MulticlassAccuracy(..., average="micro",)] but in the classification trainer it's setup as {'OverallAccuracy':MulticlassAccuracy(..., average="micro",)}

@microsoft-github-policy-service agree

@adamjstewart
Copy link
Collaborator

There are some formatting issues with lists that are causing the documentation to fail to build. We should be using the https://sublime-and-sphinx-guide.readthedocs.io/en/latest/lists.html#unordered-lists format and indenting lines that stretch across multiple lines. Also, we could be using a note: https://sublime-and-sphinx-guide.readthedocs.io/en/latest/notes_warnings.html.

@adamjstewart
Copy link
Collaborator

Also I'm happy to handle the formatting stuff myself if you want to focus on adding something to the other trainers. For better or for worse, I know most of the quirks of Sphinx/rST now.

@jdilger
Copy link
Contributor Author

jdilger commented Feb 29, 2024

@adamjstewart I think I figured it out, but do double-check 😄 . If there are still issues I'm happy to let you take it over. I don't have too much experience with Sphinx.

Side note: In contributing.rst to install the docs is written as pip install .[docs] which didn't work for me. I had to use pip install '.[docs]'. I don't know if it's a typo or related to the OS I'm using (Mac).

@adamjstewart
Copy link
Collaborator

Hmm, quotes shouldn't be required for pip. What error message did you get? What shell are you using, bash?

@jdilger
Copy link
Contributor Author

jdilger commented Feb 29, 2024

Ah, you're 100% right. I'm using zsh which gave the error zsh: no matches found: .[docs]

Copy link
Collaborator

@adamjstewart adamjstewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting looks good now. If we can do this for all other trainers by tomorrow, great. If not, we'll still merge and add more details after the release.

torchgeo/trainers/segmentation.py Outdated Show resolved Hide resolved
@jdilger
Copy link
Contributor Author

jdilger commented Feb 29, 2024

Added a first pass for the other configuration metrics. The formatting should be correct but please double-check the content :).

@adamjstewart
Copy link
Collaborator

Ah, you're 100% right. I'm using zsh which gave the error zsh: no matches found: .[docs]

It looks like zsh uses square brackets for pattern matching. Feel free to open a PR to add quotes around all use of square brackets in Installation and Contributing.

adamjstewart
adamjstewart previously approved these changes Mar 2, 2024
Copy link
Collaborator

@adamjstewart adamjstewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robmarkcole last chance to review

@robmarkcole
Copy link
Contributor

For the classification metrics, are they per pixel or per image?

@adamjstewart
Copy link
Collaborator

Classification is per-image. I'm not actually sure if micro and macro are different under that context.

@robmarkcole
Copy link
Contributor

I think we expect them to be different for imbalanced datasets. The metrics will not compute at the pixel level but at the image level - the terms total true positives, false negatives, and false positives should be used for classification

@isaaccorley isaaccorley force-pushed the documentation/OA-AA branch from 96e62e2 to 11cc51e Compare March 2, 2024 20:38
@isaaccorley isaaccorley modified the milestones: 0.5.2, 0.5.3 Mar 2, 2024
@adamjstewart adamjstewart modified the milestones: 0.5.3, 0.6.0, 0.5.2 Mar 2, 2024
@adamjstewart
Copy link
Collaborator

Want to squeeze this in to the next release, but feel free to open another PR to update these docs or change the default metrics we use after the release!

@adamjstewart adamjstewart merged commit 7241b0f into microsoft:main Mar 2, 2024
24 checks passed
isaaccorley pushed a commit that referenced this pull request Mar 2, 2024
* Expand metrics documentation

* typo

* Update documentation following robmarkcole suggestion

* Update summary to be consistent with other trainers

* Move to unordered lists, fix indentation, use note

* Update configure metrics for other trainers

* typo

* Update torchgeo/trainers/classification.py

Co-authored-by: Adam J. Stewart <[email protected]>

* Add detail on wanted values, reword macro note.

* Remove redundant paragraph

* Add acronyms, clarify regression metrics

---------

Co-authored-by: Adam J. Stewart <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
trainers PyTorch Lightning trainers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Document significance of macro vs micro averaging
4 participants