Add basic eval table logging for WandbCallback #31050

andrewtruong · 2024-05-27T02:25:11Z

What does this PR do?

This PR adds basic support for logging raw evals using WandbCallback.

Calling trainer.evaluate() will log an eval table with inputs and outputs; and
Logging is controlled by the WANDB_LOG_EVALS env var.

This also adds some changes to trainer internals to support this, including:

Updates EvalLoopOutputs to include inputs
Automatically adds a ref to the trainer when instantiating callbacks

Here's an example table that gets logged when the user calls trainer.evaluate()

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@amyeroberts are you the right person to review?

andrewtruong · 2024-05-27T16:47:30Z

I'm not sure how to get the failing tests to pass, even on main. Am I missing something?

Failing tests are:

tests/utils/test_offline.py::OfflineTests::test_offline_mode
tests/utils/test_offline.py::OfflineTests::test_offline_mode_sharded_checkpoint
tests/utils/test_offline.py::OfflineTests::test_offline_model_dynamic_model

andrewtruong · 2024-05-27T17:02:13Z

This PR also makes 2 changes to the trainer internals that could be split into their own PRs:

andrewtruong · 2024-05-30T02:54:32Z

Hi @amyeroberts!

Sorry to bother -- are you the right person to take a look at this?

amyeroberts · 2024-05-30T21:42:40Z

Hi @andrewtruong, thanks for opening this PR!

I'm going bring in @muellerzr here, who knows far more about Trainer and its interactions with W&B. One thing to note, is that integrations like W&B aren't actively maintained by the Hugging Face team. Looking at the PR, I'm not sure we want to add attributes to core objects like Trainer to enable feature integrations, unless it would unlock other things (here @muellerzr can definitely advise!)

andrewtruong · 2024-05-31T00:18:20Z

Thanks!

I'm hoping these are small and reasonable changes:

It doesn't touch the public Trainer API, only an internal object EvalLoopOutput. In that case, it just surfaces as variable that was already there but not exposed.
It adds a reference back to the trainer from a callback, which can be useful if e.g. you need to get the tokenizer or dataset which are passed into the Trainer but not into the callback.

andrewtruong · 2024-06-03T15:08:02Z

Hey @muellerzr, any chance you can take a look this week? I'd love to get this in :)

muellerzr

On paper this seems okay, let's try our best to come up with a solution that doesn't involve self-referencing the trainer if possible though please

muellerzr · 2024-06-03T15:19:47Z

src/transformers/trainer.py

+        return EvalLoopOutput(
+            predictions=all_preds, label_ids=all_labels, metrics=metrics, num_samples=num_samples, inputs=all_inputs
+        )


Some things to be very careful with, I'd appreciate checking the memory usage before/after this change. To make sure we don't have a memory leak, and we don't increase the VRAM used by the user by utilizing this

Any profiling tools you recommend / would want to see?

wandb logs work just fine :)

What's the update here?

muellerzr · 2024-06-03T15:21:49Z

src/transformers/integrations/integration_utils.py

@@ -933,6 +984,36 @@ def on_predict(self, args, state, control, metrics, **kwargs):
            metrics = rewrite_logs(metrics)
            self._wandb.log(metrics)

+    def on_evaluate(self, args, state, control, **kwargs):
+        if os.getenv("WANDB_LOG_EVALS"):
+            eval_loop_output = self.trainer.eval_loop_output


Where is eval_loop_output coming from exactly?

Thanks for catching, I missed this commit. It's added here:
b8d5c6e

muellerzr · 2024-06-03T15:22:27Z

src/transformers/trainer.py

+        def init_callback(cb):
+            cb.trainer = self
+            return cb
+
+        callbacks = [init_callback(cb) for cb in callbacks]


Is there a better way we can do this rather than adding the trainer to self here? I'm not a fan of this because otherwise if users are saving states, they have an entire reference to the trainer in there, not good.

I didn't love this either, but it's a tradeoff.

The current Trainer reporting interface looks like report_to="wandb" which is fine if your callback doesn't need any args/kwargs, but in this case we do (the tokenizer, dataset, etc.) and the one object that has all of these is the Trainer.

The alternative is also implemented in this PR, but you can't pass report_to="wandb". I think counter-intuitively you have to NOT report to wandb. Instead, you need to instantiate a callback and manually pass it in -- not the end of the world, but it didn't seem idiomatic.

trainer = Trainer(...) wandb_callback = WandbCallback(..., tokenizer=..., dataset=...) trainer.add_callback(wandb_callback)

I'd much prefer the non-idiomatic way please

HuggingFaceDocBuilderDev · 2024-06-03T15:38:48Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

andrewtruong

whoops, just realized this was stuck in "pending review" instead of actually posting.

andrewtruong · 2024-06-04T03:48:28Z

src/transformers/trainer.py

+        return EvalLoopOutput(
+            predictions=all_preds, label_ids=all_labels, metrics=metrics, num_samples=num_samples, inputs=all_inputs
+        )


Any profiling tools you recommend / would want to see?

andrewtruong · 2024-06-04T03:54:30Z

src/transformers/trainer.py

+        def init_callback(cb):
+            cb.trainer = self
+            return cb
+
+        callbacks = [init_callback(cb) for cb in callbacks]


I didn't love this either, but it's a tradeoff.

The current Trainer reporting interface looks like report_to="wandb" which is fine if your callback doesn't need any args/kwargs, but in this case we do (the tokenizer, dataset, etc.) and the one object that has all of these is the Trainer.

The alternative is also implemented in this PR, but you can't pass report_to="wandb". I think counter-intuitively you have to NOT report to wandb. Instead, you need to instantiate a callback and manually pass it in -- not the end of the world, but it didn't seem idiomatic.

trainer = Trainer(...) wandb_callback = WandbCallback(..., tokenizer=..., dataset=...) trainer.add_callback(wandb_callback)

andrewtruong · 2024-06-26T17:29:38Z

Hey @muellerzr any feedback?

muellerzr

Thanks! This looks much better :)

muellerzr · 2024-07-31T19:48:49Z

@andrewtruong can you fix the conflicts then @amyeroberts can give it a final review!

amyeroberts

Thanks for working on this!

At the moment, there's too many assumptions hard-coded about the model being a text-based model. I'd suggest maybe a new, text-specific callback class which we can safely make these assumptions

amyeroberts · 2024-08-01T20:14:54Z

src/transformers/trainer.py

+        return EvalLoopOutput(
+            predictions=all_preds, label_ids=all_labels, metrics=metrics, num_samples=num_samples, inputs=all_inputs
+        )


What's the update here?

amyeroberts · 2024-08-01T20:19:37Z

src/transformers/integrations/integration_utils.py

+                self.trainer = trainer
+
+            if tokenizer is None:
+                tokenizer = self.trainer.tokenizer


This assumes trainer is not None

amyeroberts · 2024-08-01T20:20:09Z

src/transformers/integrations/integration_utils.py

+
+            if tokenizer is None:
+                tokenizer = self.trainer.tokenizer
+            self.tokenizer = tokenizer


Do we want to set even if tokenizer is None

amyeroberts · 2024-08-01T20:20:38Z

src/transformers/integrations/integration_utils.py

+            self.tokenizer = tokenizer
+
+            if dataset is None:
+                dataset = self.trainer.eval_dataset


Same here - assumes self.trainer and self.trainer.dataset is not None

amyeroberts · 2024-08-01T20:21:37Z

src/transformers/integrations/integration_utils.py

    """

-    def __init__(self):
+    def __init__(


There should be a docstring for all these args. In particular num_samples and freq which aren't obvious

amyeroberts · 2024-08-01T20:23:49Z

src/transformers/integrations/integration_utils.py

+            try:
+                sampled_dataset = dataset.select(range(num_samples))
+            except IndexError as e:
+                print(f"WARNING: Could not get those indices: {e=}")


Could we make this a bit clearer - the user never specifies indices. so it's a bit weird to refer to them as "those indices". Maybe something along the lines of "Could not select {num_sample=} rows from the dataset"

amyeroberts · 2024-08-01T20:24:06Z

src/transformers/integrations/integration_utils.py

+                dataset = self.trainer.eval_dataset
+
+            try:
+                sampled_dataset = dataset.select(range(num_samples))


This assumes dataset is not None

amyeroberts · 2024-08-01T20:24:37Z

src/transformers/integrations/integration_utils.py

+                print(f"WARNING: Could not get those indices: {e=}")
+                sampled_dataset = dataset
+
+            self.sample_dataset = sampled_dataset


Do we want to store both the full and sampled dataset?

amyeroberts · 2024-08-01T20:24:56Z

src/transformers/integrations/integration_utils.py

+            if ignore_tokens is None:
+                ignore_tokens = [-100]
+
+            padding_token_id = self.tokenizer.pad_token_id


Assumes tokenizer is not None. Confusingly, the tokenizer for Trainer may not be a tokenizer at all c.f. #32385

amyeroberts · 2024-08-01T20:33:50Z

src/transformers/integrations/integration_utils.py

+                a[mask] = padding_token_id
+                return a
+
+            self._replace_ignored_tokens_func = replace_ignored_tokens


Why do we need this functionality in the callback?

github-actions · 2024-08-26T08:06:08Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

andrewtruong marked this pull request as ready for review May 27, 2024 17:02

andrewtruong changed the title ~~[WIP] Add basic eval table logging for WandbCallback~~ Add basic eval table logging for WandbCallback May 27, 2024

muellerzr reviewed Jun 3, 2024

View reviewed changes

andrewtruong commented Jun 24, 2024

View reviewed changes

andrewtruong added 6 commits June 24, 2024 12:58

add inputs to EvalLoopOutputs

27d7393

add a ref to the trainer that callbacks are attached to

4a68300

add basic wandb tables evals logging of inputs, outputs, and expected

bf1f801

types

e4dec9e

lint

c84d0c4

update docstring to mention evals

a80db7a

andrewtruong force-pushed the wandb/log_evals branch from 4235289 to a80db7a Compare June 24, 2024 16:59

andrewtruong added 3 commits June 24, 2024 16:24

add missing eval_loop_output

b8d5c6e

add guards for empty fields

b176f29

fmt

2d9b503

huggingface deleted a comment from github-actions bot Jul 28, 2024

muellerzr approved these changes Jul 31, 2024

View reviewed changes

muellerzr requested a review from amyeroberts July 31, 2024 19:48

amyeroberts reviewed Aug 1, 2024

View reviewed changes

Add basic eval table logging for WandbCallback #31050

Are you sure you want to change the base?

Add basic eval table logging for WandbCallback #31050

Conversation

andrewtruong commented May 27, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

andrewtruong commented May 27, 2024

andrewtruong commented May 27, 2024

andrewtruong commented May 30, 2024

amyeroberts commented May 30, 2024

andrewtruong commented May 31, 2024

andrewtruong commented Jun 3, 2024

muellerzr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Jun 3, 2024

andrewtruong left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrewtruong commented Jun 26, 2024

muellerzr left a comment

Choose a reason for hiding this comment

muellerzr commented Jul 31, 2024

amyeroberts left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Aug 26, 2024

andrewtruong commented May 27, 2024 •

edited

Loading