Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

Fix an index bug in BART prediction. #175

Merged
merged 2 commits into from
Dec 7, 2020
Merged

Conversation

wlhgtc
Copy link
Contributor

@wlhgtc wlhgtc commented Nov 24, 2020

The index should be i rather than 0 in make_output_human_readable method.

@wlhgtc
Copy link
Contributor Author

wlhgtc commented Nov 24, 2020

And I wonder why we add 2 special tokens when we start (generate)decode, In here.

Copy link
Member

@epwalsh epwalsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @wlhgtc! Sorry for the delay on this. As for your question about the start tokens, @dirkgr might have some more context.

@@ -390,7 +390,7 @@ def make_output_human_readable(self, output_dict: Dict[str, torch.Tensor]) -> Di
predicted_tokens = [None] * predictions.shape[0]
for i in range(predictions.shape[0]):
predicted_tokens[i] = self._indexer.indices_to_tokens(
{"token_ids": predictions[0].tolist()}, self.vocab
{"token_ids": predictions[i].tolist()}, self.vocab
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, what a bug

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I'm surprised I didn't catch it earlier. Perhaps I was only using batch size 1 when making predictions and missed it.

@dirkgr
Copy link
Member

dirkgr commented Dec 5, 2020

@wlhgtc, I suspect it's there because huggingface does it, and to make the weights work we have to do whatever huggingface does. But it's not in the original paper, and I can't find that part in the huggingface code either. @Tobias-Rohde, do you know why?

@dirkgr
Copy link
Member

dirkgr commented Dec 7, 2020

@wlhgtc, does something bad happen when you take away the second special token? If not, I'd be in favor of removing it and retraining the model.

@dirkgr
Copy link
Member

dirkgr commented Dec 7, 2020

I will merge this though and we can take the special token thing in a separate PR.

@dirkgr dirkgr merged commit ea1f71c into allenai:master Dec 7, 2020
@Tobias-Rohde
Copy link
Contributor

Tobias-Rohde commented Dec 7, 2020

@wlhgtc, I suspect it's there because huggingface does it, and to make the weights work we have to do whatever huggingface does. But it's not in the original paper, and I can't find that part in the huggingface code either. @Tobias-Rohde, do you know why?

Yes, it's for consistency with hugginface, but that's also what they did in BART, even though it might not be mentioned explicitly.
If I recall correctly, for some sequence <s> A B C </s>, they use </s> <s> A B C as decoder input to predict <s> A B C </s> as target.

See here for whole thread on this involving the BART authors:
facebookresearch/fairseq#1389

And here for the part where huggingface adds the token:
https://github.com/huggingface/transformers/blob/00aa9dbca29dcf0e3a624354ef5c80a8e5226339/src/transformers/generation_utils.py#L88

@wlhgtc
Copy link
Contributor Author

wlhgtc commented Dec 8, 2020

@wlhgtc, does something bad happen when you take away the second special token? If not, I'd be in favor of removing it and retraining the model.

Sorry for my lately reply.
I modify the code to a MBART model. Cause it's a multi-language model, hugging-face append a language symbol(like zh_CN for Chinese) at the start of the decoder. I replace <s> with zh_CN. All things goes well.

@wlhgtc
Copy link
Contributor Author

wlhgtc commented Dec 8, 2020

@Tobias-Rohde
In fairseq it's seems like a teacher force proxy:move_eos_to_beginning.
But in transformers there is only one token in here

@dirkgr
Copy link
Member

dirkgr commented Dec 8, 2020

And here for the part where huggingface adds the token:

@Tobias-Rohde, that code only adds one token though.

@Tobias-Rohde
Copy link
Contributor

Tobias-Rohde commented Dec 9, 2020

And here for the part where huggingface adds the token:

@Tobias-Rohde, that code only adds one token though.

@dirkgr @wlhgtc
Ah you're right, the code has changed quite a bit since I've worked on it, but I just stepped through it with the debugger and I see what changed.

See this part of huggingface's beam search:

https://github.com/huggingface/transformers/blob/67ff1c314a61a2d5949b3bb48fa3ec7e9b697d7e/src/transformers/generation_utils.py#L984

which calls this when using Bart:

https://github.com/huggingface/transformers/blob/67ff1c314a61a2d5949b3bb48fa3ec7e9b697d7e/src/transformers/models/bart/modeling_bart.py#L1083

There it checks if the current generated length is 1, i.e EOS was generated (which happens in the function I linked earlier) and if force_bos_token_to_be_generated is true, it adjusts the logits so that BOS is generated next. Interestingly, force_bos_token_to_be_generated seems to be disabled by default now.

@dirkgr
Copy link
Member

dirkgr commented Dec 10, 2020

What needs to be done then? Should we take out the second token?

What's the point of force-generating a token like that?

@Tobias-Rohde
Copy link
Contributor

Tobias-Rohde commented Dec 14, 2020

What needs to be done then? Should we take out the second token?

What's the point of force-generating a token like that?

Originally I implemented it in this way to try to reproduce the exact behavior of what huggingface was doing while trying to figure out why I was getting worse performance on CNN/DM.

I think it would be fine to remove it, especially since force_bos_token_to_be_generated is False by default anyways.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants