Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

Update simple_seq2seq.py #90

Merged
merged 27 commits into from
Aug 18, 2020
Merged

Update simple_seq2seq.py #90

merged 27 commits into from
Aug 18, 2020

Conversation

wlhgtc
Copy link
Contributor

@wlhgtc wlhgtc commented Jul 12, 2020

  1. support multi-layer decoders
  2. return all top_k_predictions (tokens) for beam_search
  3. support to load pre-train embedding files for target embedding

1. support multi-layer decoders
2. return all top_k_predictions (tokens) for beam_search
3. support to load pre-train embedding files for target embedding
Copy link
Member

@dirkgr dirkgr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you run the auto formatter and make sure all the tests pass?

It's hard to review with random formatting changes everywhere, and I suspect that this code doesn't work yet.

target_embedding_dim: int = None,
scheduled_sampling_ratio: float = 0.0,
use_bleu: bool = True,
bleu_ngram_weights: Iterable[float] = (0.25, 0.25, 0.25, 0.25),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the new parameters at the end, so that the code stays backwards compatible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finish!

) -> None:
super().__init__(vocab)
self.source_embedding_dim = source_embedding_dim
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why no underscore before _source_embedding_dim?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you add this parameter at all? Isn't it possible to get the source embedding dimension from the source embedder, without having to specify it? Also, if it's added here it needs to be added to the documentation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This parameter is useful when you have extra-features.
Suppose you have both word embedding(600 dim) and pos tags(600 dim). Then the embedding become (batch, length, 1200). But the encoder accept tensors with 600-dim. Then you may use a linear layer or directly add(just like style in BERT). This parameter will help.
But I have not finish all style "feature_merge" code in an elegant way. So I remove it.
I will add some modules to finish it in the future.

max_decoding_steps: int,
attention: Attention = None,
beam_size: int = None,
decoder_layers: int = 2,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be called target_decoder_layers, and default to 1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finish

# if len(indices.shape) > 1:
# indices = indices[0]
batch_predicted_tokens = []
for indices in top_k_predictions:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the extra loop necessary now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original code only return top1 result for beam_search. It's not convenient if we want eval top5 score (or we want choose result by some hand-craft algorithm). So I use the code segment in copy-net to get all results.

@wlhgtc
Copy link
Contributor Author

wlhgtc commented Jul 13, 2020

@dirkgr All tests passed except a case with target_decoder_layer = 2. We have to update beam search code in #4462. Then it will pass.
But I wonder why I need CI / Pretrained Models (pull_request) test ? It has no relation with pretrained model.

@dirkgr
Copy link
Member

dirkgr commented Jul 15, 2020

The pretrained tests were broken in master when you were doing this. The failure has nothing to do with you. You can fix it by merging from master again. Sorry about that.

Copy link
Member

@dirkgr dirkgr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the same question here that I have for allenai/allennlp#4462: Would it not be easier to flatten/unflatten the decoder state in the model, so that from the outside it looks exactly the same, and all existing code that works with encoder/decoder models doesn't need any changes?

@wlhgtc
Copy link
Contributor Author

wlhgtc commented Jul 16, 2020

@dirkgr After thinking carefully about your advice.
There are 2 different ways:

  1. Change code in beam_search, just as what I do in allennlp/#4462
  2. Adjust code in all LSMT decoder models in function _forward_beam_search :
    all_top_k_predictions, log_probabilities = self._beam_search.search(
    start_predictions, state, self.take_search_step
    )

    all_top_k_predictions, log_probabilities = self._beam_search.search(
    start_predictions, state, self.take_step
    )

    The code will looks like:
   # flatten operations
   ....

   self._beam_search.search(...)

   # unflatten operations
   ....

I prefer the first one, for the second one we need to repeat same code in many Models.

@matt-gardner
Copy link
Contributor

@wlhgtc, sorry, @dirkgr is out for a little bit, and we aren't sure if this is waiting on us or on you. Can you give a quick update on where you think this stands?

@wlhgtc
Copy link
Contributor Author

wlhgtc commented Aug 2, 2020

@matt-gardner Sorry, it's on my side. This PR aim to support multi-layer decoder in seq2seq model.
I have finished the changes, but the code failed cause it needs some update in beam search( #4462).
Now this PR has been merged into master.
But I don't know the test code based on the allennlp code in master or the newest released version ?
I will adjust my code and try it again in the next few days

@dirkgr
Copy link
Member

dirkgr commented Aug 4, 2020

The code here will run against the latest master version of allennlp, but you might have to merge/resolve conflicts here before that happens.

@wlhgtc
Copy link
Contributor Author

wlhgtc commented Aug 7, 2020

@dirkgr I re-push my code, all test case passed.
But seem there is an error about ssh key in here, could you help me fix it ?

step : `int`
The time step in beam search decoding.

>>>>>>> 5d9098f6084a12da77b02d40e0d9392113aeb805
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You checked in some unmerged files. I don't think it's serious, but we can't merge it like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment on lines +22 to +23
for predicted_token in predicted_tokens:
assert all(isinstance(x, str) for x in predicted_token)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

predicted_tokens is now a list of lists?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it contains top n sequences, could see in here

@dirkgr
Copy link
Member

dirkgr commented Aug 7, 2020

I don't know about the ssh failure. @epwalsh, is it possible that this test can never succeed when the PR comes from a fork?

@epwalsh
Copy link
Member

epwalsh commented Aug 11, 2020

Just fixed the SSH issue with the docs. There was another build error in that job but it was because of bad formatting in a docstring. I think my suggestion would fix that though.

@wlhgtc
Copy link
Contributor Author

wlhgtc commented Aug 12, 2020

@epwalsh Thanks for your advice. Now all tests pass, can we merge it into master ?

@dirkgr
Copy link
Member

dirkgr commented Aug 18, 2020

Thanks for sticking with it!

@dirkgr dirkgr merged commit c211baf into allenai:master Aug 18, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants