Fix an index bug in BART prediction. #175

wlhgtc · 2020-11-24T02:50:08Z

The index should be i rather than 0 in make_output_human_readable method.

wlhgtc · 2020-11-24T02:53:23Z

And I wonder why we add 2 special tokens when we start (generate)decode, In here.

epwalsh

Thanks @wlhgtc! Sorry for the delay on this. As for your question about the start tokens, @dirkgr might have some more context.

dirkgr · 2020-12-05T02:16:05Z

allennlp_models/generation/models/bart.py

@@ -390,7 +390,7 @@ def make_output_human_readable(self, output_dict: Dict[str, torch.Tensor]) -> Di
        predicted_tokens = [None] * predictions.shape[0]
        for i in range(predictions.shape[0]):
            predicted_tokens[i] = self._indexer.indices_to_tokens(
-                {"token_ids": predictions[0].tolist()}, self.vocab
+                {"token_ids": predictions[i].tolist()}, self.vocab


Wow, what a bug

Indeed, I'm surprised I didn't catch it earlier. Perhaps I was only using batch size 1 when making predictions and missed it.

dirkgr · 2020-12-05T03:00:23Z

@wlhgtc, I suspect it's there because huggingface does it, and to make the weights work we have to do whatever huggingface does. But it's not in the original paper, and I can't find that part in the huggingface code either. @Tobias-Rohde, do you know why?

dirkgr · 2020-12-07T22:27:24Z

@wlhgtc, does something bad happen when you take away the second special token? If not, I'd be in favor of removing it and retraining the model.

dirkgr · 2020-12-07T22:28:43Z

I will merge this though and we can take the special token thing in a separate PR.

Tobias-Rohde · 2020-12-07T23:47:00Z

@wlhgtc, I suspect it's there because huggingface does it, and to make the weights work we have to do whatever huggingface does. But it's not in the original paper, and I can't find that part in the huggingface code either. @Tobias-Rohde, do you know why?

Yes, it's for consistency with hugginface, but that's also what they did in BART, even though it might not be mentioned explicitly.
If I recall correctly, for some sequence <s> A B C </s>, they use </s> <s> A B C as decoder input to predict <s> A B C </s> as target.

See here for whole thread on this involving the BART authors:
facebookresearch/fairseq#1389

And here for the part where huggingface adds the token:
https://github.com/huggingface/transformers/blob/00aa9dbca29dcf0e3a624354ef5c80a8e5226339/src/transformers/generation_utils.py#L88

wlhgtc · 2020-12-08T09:14:12Z

@wlhgtc, does something bad happen when you take away the second special token? If not, I'd be in favor of removing it and retraining the model.

Sorry for my lately reply.
I modify the code to a MBART model. Cause it's a multi-language model, hugging-face append a language symbol(like zh_CN for Chinese) at the start of the decoder. I replace <s> with zh_CN. All things goes well.

wlhgtc · 2020-12-08T09:25:04Z

@Tobias-Rohde
In fairseq it's seems like a teacher force proxy:move_eos_to_beginning.
But in transformers there is only one token in here

dirkgr · 2020-12-08T17:25:05Z

And here for the part where huggingface adds the token:

@Tobias-Rohde, that code only adds one token though.

Tobias-Rohde · 2020-12-09T01:14:27Z

And here for the part where huggingface adds the token:

@Tobias-Rohde, that code only adds one token though.

@dirkgr @wlhgtc
Ah you're right, the code has changed quite a bit since I've worked on it, but I just stepped through it with the debugger and I see what changed.

See this part of huggingface's beam search:

https://github.com/huggingface/transformers/blob/67ff1c314a61a2d5949b3bb48fa3ec7e9b697d7e/src/transformers/generation_utils.py#L984

which calls this when using Bart:

https://github.com/huggingface/transformers/blob/67ff1c314a61a2d5949b3bb48fa3ec7e9b697d7e/src/transformers/models/bart/modeling_bart.py#L1083

There it checks if the current generated length is 1, i.e EOS was generated (which happens in the function I linked earlier) and if force_bos_token_to_be_generated is true, it adjusts the logits so that BOS is generated next. Interestingly, force_bos_token_to_be_generated seems to be disabled by default now.

dirkgr · 2020-12-10T18:51:04Z

What needs to be done then? Should we take out the second token?

What's the point of force-generating a token like that?

Tobias-Rohde · 2020-12-14T23:17:56Z

What needs to be done then? Should we take out the second token?

What's the point of force-generating a token like that?

Originally I implemented it in this way to try to reproduce the exact behavior of what huggingface was doing while trying to figure out why I was getting worse performance on CNN/DM.

I think it would be fine to remove it, especially since force_bos_token_to_be_generated is False by default anyways.

fix index bug

5cd22eb

Merge branch 'master' into master

1f5fdcc

epwalsh approved these changes Dec 4, 2020

View reviewed changes

dirkgr reviewed Dec 5, 2020

View reviewed changes

dirkgr merged commit ea1f71c into allenai:master Dec 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix an index bug in BART prediction. #175

Fix an index bug in BART prediction. #175

wlhgtc commented Nov 24, 2020

wlhgtc commented Nov 24, 2020

epwalsh left a comment

dirkgr Dec 5, 2020

Tobias-Rohde Dec 7, 2020

dirkgr commented Dec 5, 2020

dirkgr commented Dec 7, 2020

dirkgr commented Dec 7, 2020

Tobias-Rohde commented Dec 7, 2020 •

edited

Loading

wlhgtc commented Dec 8, 2020 •

edited

Loading

wlhgtc commented Dec 8, 2020

dirkgr commented Dec 8, 2020

Tobias-Rohde commented Dec 9, 2020 •

edited

Loading

dirkgr commented Dec 10, 2020

Tobias-Rohde commented Dec 14, 2020 •

edited

Loading

Fix an index bug in BART prediction. #175

Fix an index bug in BART prediction. #175

Conversation

wlhgtc commented Nov 24, 2020

wlhgtc commented Nov 24, 2020

epwalsh left a comment

Choose a reason for hiding this comment

dirkgr Dec 5, 2020

Choose a reason for hiding this comment

Tobias-Rohde Dec 7, 2020

Choose a reason for hiding this comment

dirkgr commented Dec 5, 2020

dirkgr commented Dec 7, 2020

dirkgr commented Dec 7, 2020

Tobias-Rohde commented Dec 7, 2020 • edited Loading

wlhgtc commented Dec 8, 2020 • edited Loading

wlhgtc commented Dec 8, 2020

dirkgr commented Dec 8, 2020

Tobias-Rohde commented Dec 9, 2020 • edited Loading

dirkgr commented Dec 10, 2020

Tobias-Rohde commented Dec 14, 2020 • edited Loading

Tobias-Rohde commented Dec 7, 2020 •

edited

Loading

wlhgtc commented Dec 8, 2020 •

edited

Loading

Tobias-Rohde commented Dec 9, 2020 •

edited

Loading

Tobias-Rohde commented Dec 14, 2020 •

edited

Loading