Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Translations differ #546

Closed
anderleich opened this issue Aug 25, 2021 · 17 comments · Fixed by #573
Closed

Translations differ #546

anderleich opened this issue Aug 25, 2021 · 17 comments · Fixed by #573
Labels
bug Something isn't working

Comments

@anderleich
Copy link

anderleich commented Aug 25, 2021

Hi,

I've recently realized that my converted OpenNMT-py model is not returning the same translation for a sentence when compared to the original OpenNMT-py model. Model architecture is the default base Transformer. I'm using Ctranslate2 version 1.18.1.

It seems Ctranslate2 model is merging different hypotheses into the final result, thus, inserting some word repetitions and synonyms in the translation.

Example result:

OpenNMT-py: 2013ko abuztutik aurrera, ikertzaile txinatarren zenbait taldek instalazioetan egindako CRISPR bidezko lehen edizio arrakastatsuak dokumentatu zituzten.

Ctranslate2: 2013ko abuztutik aurrera, Txinako zenbait ikertzailbatzuek instalazioetan egindako CRISPR bidezko lehen edizio genetiko arrakastatsuak dokumentatu zituzten.

In the second case it should be either zenbait ikertzailek or ikertzaile batzuek, but it is combining both of them, even truncating some words.

This is the configuration file for the server:

{
    "models_root": "/absolute/path/to/model/dir/",
    "models": [
        {
            "id": 100,
            "ct2_model": "BEST_MODEL.pt_ctrans",
            "model": "BEST_MODEL.pt",
            "timeout": 600,
            "on_timeout": "to_cpu",
            "load": true,
            "opt": {
                "gpu": 0,
                "batch_size": 64,
                "beam_size": 5,
                "max_length": 200
            },
            "tokenizer": {
                "type": "pyonmttok",
                "mode": "conservative",
                "params": {
                    "bpe_model_path": "/absolute/path/codes.bpe",
                    "joiner": "\uffed",
                    "joiner_annotate": true,
                    "case_markup": true
                }
            }
        }
    ]
}

Is this a known issue?

Thanks

@guillaumekln
Copy link
Collaborator

If you converted the model with quantization, then differences are expected.

Without quantization, there is no guarantee that the translations are the same. In rare cases there could be small differences because the implementations are different. The difference (if any) can either improve or degrade the translation.

@anderleich
Copy link
Author

I see... That's why it is combining words too?

@guillaumekln
Copy link
Collaborator

guillaumekln commented Aug 30, 2021

If there are differences, they could be of any types. Here it could be mean that an extra joiner was generated.

Did you find other differences apart from this specific example?

@BrightXiaoHan
Copy link
Contributor

I also get different translation results between inferencing on GPU and CPU.

@guillaumekln
Copy link
Collaborator

guillaumekln commented Sep 2, 2021

Differences between CPU and GPU output is a different issue. Please open a separate issue if required. This is generally expected since the backends have non identical numerical outputs.

@anderleich
Copy link
Author

anderleich commented Sep 6, 2021

I did some tests a while ago translating several test files with OpenNMT-py models and Ctranslate2 converted models. I recall obtaining same results in both cases. That's why I was surprised to see such cases where words are either repeated or joined. Joined words seem to come from different hypotheses. I just posted an example but I have some more examples in which similar joining issues happen.
I understand there can be some differences, and that the model could be generating extra joiners. It is the nature of those joins what surprises me tough.

@guillaumekln
Copy link
Collaborator

Are you able to share this model and a test case? If yes, I could take a deeper look and see why this is happening.

@anderleich
Copy link
Author

I'm afraid I cannot share the model, sorry...

@guillaumekln
Copy link
Collaborator

Model architecture is the default base Transformer

To be sure, can you post the exact set of options you used to train this model?

@anderleich
Copy link
Author

I used the default Transformer architecture options described in the docs:

https://opennmt.net/OpenNMT-py/FAQ.html#how-do-i-use-the-transformer-model

@anderleich
Copy link
Author

Hi @guillaumekln ,

We've been conducting some more tests to check where the source of the error was. We've found a strange behavior when translating with the server and using a Ctranslate2 model. As you mentioned, we can expect some differences in the translations returned by Ctranslate2 when compared to OpenNMT models. However, we surprisingly found that translations can differ even between requests depending on the number of sentences sent to the server.

Translating the sentence in isolation results in the issue mentioned in this thread:
curl -X POST -H 'Content-type: application/json' -d '[{"id":100,"src":"Todas estas críticas podrían rebatirse o confirmarse con análisis adicionales por parte del equipo de Mitalipov."}]' "http://localhost:9866/translator/translate"

Result
[[{"n_best":1,"pred_score":-11.816484451293945,"src":"Todas estas cr\u00edticas podr\u00edan rebatirse o confirmarse con an\u00e1lisis adicionales por parte del equipo de Mitalipov.","tgt":"Kritika horiek guztiak Mitalipovdaitezke, Mitalipov-eko taldearen analisi gehigarrien bidez."}]]

Notice the Mitalipovdaitezke word which merges to valid words but do not really make sense.

However, when sending another sentence along with the previous one, it drastically changes the result:
curl -X POST -H 'Content-type: application/json' -d '[{"id":100,"src":"Todas estas críticas podrían rebatirse o confirmarse con análisis adicionales por parte del equipo de Mitalipov."}, {"id": 100, "src": "Todas estas críticas podrían rebatirse o confirmarse con análisis adicionales por parte del equipo de Corral."}]' "http://localhost:9866/translator/translate"

[[{"n_best":1,"pred_score":-11.816478729248047,"src":"Todas estas cr\u00edticas podr\u00edan rebatirse o confirmarse con an\u00e1lisis adicionales por parte del equipo de Mitalipov.","tgt":"Kritika horiek guztiak gaitzetsi edo berrets daitezke, Mitalipov-eko taldearen analisi gehigarrien bidez."},{"n_best":1,"pred_score":-11.59411334991455,"src":"Todas estas cr\u00edticas podr\u00edan rebatirse o confirmarse con an\u00e1lisis adicionales por parte del equipo de Corral.","tgt":"Kritika horiek guztiak gaitzetsi edo berrets daitezke, Corraleko taldearen analisi gehigarrien bidez."}]]

In this case the sentence is correctly translated.

@guillaumekln
Copy link
Collaborator

guillaumekln commented Oct 4, 2021

Are you translating on CPU or GPU?

@anderleich
Copy link
Author

on GPU

@guillaumekln
Copy link
Collaborator

Thanks for the additional observation.

I was now able to reproduce a bug impacting the beam search. In some cases the incorrect word IDs were selected to build the final hypotheses. This explains the incorrect translations you are reporting.

Thanks again. This is an important bug fix that I will release ASAP.

@guillaumekln guillaumekln added the bug Something isn't working label Oct 4, 2021
@anderleich
Copy link
Author

Glad to hear that!
Could you let me know when you release the new version for PIP?
Thanks!

@guillaumekln
Copy link
Collaborator

I pushed version 2.5.1 with the fix.

@anderleich
Copy link
Author

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants