-
Notifications
You must be signed in to change notification settings - Fork 175
Conversation
wlhgtc
commented
Jul 12, 2020
- support multi-layer decoders
- return all top_k_predictions (tokens) for beam_search
- support to load pre-train embedding files for target embedding
1. support multi-layer decoders 2. return all top_k_predictions (tokens) for beam_search 3. support to load pre-train embedding files for target embedding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you run the auto formatter and make sure all the tests pass?
It's hard to review with random formatting changes everywhere, and I suspect that this code doesn't work yet.
target_embedding_dim: int = None, | ||
scheduled_sampling_ratio: float = 0.0, | ||
use_bleu: bool = True, | ||
bleu_ngram_weights: Iterable[float] = (0.25, 0.25, 0.25, 0.25), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add the new parameters at the end, so that the code stays backwards compatible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finish!
) -> None: | ||
super().__init__(vocab) | ||
self.source_embedding_dim = source_embedding_dim |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why no underscore before _source_embedding_dim
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you add this parameter at all? Isn't it possible to get the source embedding dimension from the source embedder, without having to specify it? Also, if it's added here it needs to be added to the documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This parameter is useful when you have extra-features.
Suppose you have both word embedding(600 dim) and pos tags(600 dim). Then the embedding become (batch, length, 1200). But the encoder accept tensors with 600-dim. Then you may use a linear layer or directly add(just like style in BERT). This parameter will help.
But I have not finish all style "feature_merge" code in an elegant way. So I remove it.
I will add some modules to finish it in the future.
max_decoding_steps: int, | ||
attention: Attention = None, | ||
beam_size: int = None, | ||
decoder_layers: int = 2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be called target_decoder_layers
, and default to 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finish
# if len(indices.shape) > 1: | ||
# indices = indices[0] | ||
batch_predicted_tokens = [] | ||
for indices in top_k_predictions: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the extra loop necessary now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original code only return top1 result for beam_search. It's not convenient if we want eval top5 score (or we want choose result by some hand-craft algorithm). So I use the code segment in copy-net to get all results.
1. fix format questions 2. rename(and remove) some parameters
The pretrained tests were broken in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the same question here that I have for allenai/allennlp#4462: Would it not be easier to flatten/unflatten the decoder state in the model, so that from the outside it looks exactly the same, and all existing code that works with encoder/decoder models doesn't need any changes?
@dirkgr After thinking carefully about your advice.
I prefer the first one, for the second one we need to repeat same code in many Models. |
@matt-gardner Sorry, it's on my side. This PR aim to support multi-layer decoder in seq2seq model. |
The code here will run against the latest master version of allennlp, but you might have to merge/resolve conflicts here before that happens. |
step : `int` | ||
The time step in beam search decoding. | ||
|
||
>>>>>>> 5d9098f6084a12da77b02d40e0d9392113aeb805 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You checked in some unmerged files. I don't think it's serious, but we can't merge it like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
for predicted_token in predicted_tokens: | ||
assert all(isinstance(x, str) for x in predicted_token) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
predicted_tokens
is now a list of lists?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it contains top n sequences, could see in here
I don't know about the ssh failure. @epwalsh, is it possible that this test can never succeed when the PR comes from a fork? |
# Conflicts: # allennlp_models/generation/models/simple_seq2seq.py
Just fixed the SSH issue with the docs. There was another build error in that job but it was because of bad formatting in a docstring. I think my suggestion would fix that though. |
fix doc format Co-authored-by: Evan Pete Walsh <[email protected]>
@epwalsh Thanks for your advice. Now all tests pass, can we merge it into master ? |
Thanks for sticking with it! |