Translations differ #546

anderleich · 2021-08-25T08:37:42Z

Hi,

I've recently realized that my converted OpenNMT-py model is not returning the same translation for a sentence when compared to the original OpenNMT-py model. Model architecture is the default base Transformer. I'm using Ctranslate2 version 1.18.1.

It seems Ctranslate2 model is merging different hypotheses into the final result, thus, inserting some word repetitions and synonyms in the translation.

Example result:

OpenNMT-py: 2013ko abuztutik aurrera, ikertzaile txinatarren zenbait taldek instalazioetan egindako CRISPR bidezko lehen edizio arrakastatsuak dokumentatu zituzten.

Ctranslate2: 2013ko abuztutik aurrera, Txinako zenbait ikertzailbatzuek instalazioetan egindako CRISPR bidezko lehen edizio genetiko arrakastatsuak dokumentatu zituzten.

In the second case it should be either zenbait ikertzailek or ikertzaile batzuek, but it is combining both of them, even truncating some words.

This is the configuration file for the server:

{
    "models_root": "/absolute/path/to/model/dir/",
    "models": [
        {
            "id": 100,
            "ct2_model": "BEST_MODEL.pt_ctrans",
            "model": "BEST_MODEL.pt",
            "timeout": 600,
            "on_timeout": "to_cpu",
            "load": true,
            "opt": {
                "gpu": 0,
                "batch_size": 64,
                "beam_size": 5,
                "max_length": 200
            },
            "tokenizer": {
                "type": "pyonmttok",
                "mode": "conservative",
                "params": {
                    "bpe_model_path": "/absolute/path/codes.bpe",
                    "joiner": "\uffed",
                    "joiner_annotate": true,
                    "case_markup": true
                }
            }
        }
    ]
}

Is this a known issue?

Thanks

The text was updated successfully, but these errors were encountered:

guillaumekln · 2021-08-25T09:40:05Z

If you converted the model with quantization, then differences are expected.

Without quantization, there is no guarantee that the translations are the same. In rare cases there could be small differences because the implementations are different. The difference (if any) can either improve or degrade the translation.

anderleich · 2021-08-25T10:50:45Z

I see... That's why it is combining words too?

guillaumekln · 2021-08-30T14:25:05Z

If there are differences, they could be of any types. Here it could be mean that an extra joiner was generated.

Did you find other differences apart from this specific example?

BrightXiaoHan · 2021-09-02T08:17:20Z

I also get different translation results between inferencing on GPU and CPU.

guillaumekln · 2021-09-02T09:05:24Z

Differences between CPU and GPU output is a different issue. Please open a separate issue if required. This is generally expected since the backends have non identical numerical outputs.

anderleich · 2021-09-06T10:28:42Z

I did some tests a while ago translating several test files with OpenNMT-py models and Ctranslate2 converted models. I recall obtaining same results in both cases. That's why I was surprised to see such cases where words are either repeated or joined. Joined words seem to come from different hypotheses. I just posted an example but I have some more examples in which similar joining issues happen.
I understand there can be some differences, and that the model could be generating extra joiners. It is the nature of those joins what surprises me tough.

guillaumekln · 2021-09-06T15:12:13Z

Are you able to share this model and a test case? If yes, I could take a deeper look and see why this is happening.

anderleich · 2021-09-08T13:22:54Z

I'm afraid I cannot share the model, sorry...

guillaumekln · 2021-09-13T16:06:45Z

Model architecture is the default base Transformer

To be sure, can you post the exact set of options you used to train this model?

anderleich · 2021-09-24T12:20:39Z

I used the default Transformer architecture options described in the docs:

https://opennmt.net/OpenNMT-py/FAQ.html#how-do-i-use-the-transformer-model

anderleich · 2021-10-04T10:40:32Z

Hi @guillaumekln ,

We've been conducting some more tests to check where the source of the error was. We've found a strange behavior when translating with the server and using a Ctranslate2 model. As you mentioned, we can expect some differences in the translations returned by Ctranslate2 when compared to OpenNMT models. However, we surprisingly found that translations can differ even between requests depending on the number of sentences sent to the server.

Translating the sentence in isolation results in the issue mentioned in this thread:
curl -X POST -H 'Content-type: application/json' -d '[{"id":100,"src":"Todas estas críticas podrían rebatirse o confirmarse con análisis adicionales por parte del equipo de Mitalipov."}]' "http://localhost:9866/translator/translate"

Result
[[{"n_best":1,"pred_score":-11.816484451293945,"src":"Todas estas cr\u00edticas podr\u00edan rebatirse o confirmarse con an\u00e1lisis adicionales por parte del equipo de Mitalipov.","tgt":"Kritika horiek guztiak Mitalipovdaitezke, Mitalipov-eko taldearen analisi gehigarrien bidez."}]]

Notice the Mitalipovdaitezke word which merges to valid words but do not really make sense.

However, when sending another sentence along with the previous one, it drastically changes the result:
curl -X POST -H 'Content-type: application/json' -d '[{"id":100,"src":"Todas estas críticas podrían rebatirse o confirmarse con análisis adicionales por parte del equipo de Mitalipov."}, {"id": 100, "src": "Todas estas críticas podrían rebatirse o confirmarse con análisis adicionales por parte del equipo de Corral."}]' "http://localhost:9866/translator/translate"

[[{"n_best":1,"pred_score":-11.816478729248047,"src":"Todas estas cr\u00edticas podr\u00edan rebatirse o confirmarse con an\u00e1lisis adicionales por parte del equipo de Mitalipov.","tgt":"Kritika horiek guztiak gaitzetsi edo berrets daitezke, Mitalipov-eko taldearen analisi gehigarrien bidez."},{"n_best":1,"pred_score":-11.59411334991455,"src":"Todas estas cr\u00edticas podr\u00edan rebatirse o confirmarse con an\u00e1lisis adicionales por parte del equipo de Corral.","tgt":"Kritika horiek guztiak gaitzetsi edo berrets daitezke, Corraleko taldearen analisi gehigarrien bidez."}]]

In this case the sentence is correctly translated.

guillaumekln · 2021-10-04T10:43:15Z

Are you translating on CPU or GPU?

anderleich · 2021-10-04T11:01:03Z

on GPU

guillaumekln · 2021-10-04T16:02:30Z

Thanks for the additional observation.

I was now able to reproduce a bug impacting the beam search. In some cases the incorrect word IDs were selected to build the final hypotheses. This explains the incorrect translations you are reporting.

Thanks again. This is an important bug fix that I will release ASAP.

anderleich · 2021-10-04T18:48:33Z

Glad to hear that!
Could you let me know when you release the new version for PIP?
Thanks!

guillaumekln · 2021-10-04T19:04:59Z

I pushed version 2.5.1 with the fix.

anderleich · 2021-10-04T19:05:56Z

Thanks!

guillaumekln mentioned this issue Oct 4, 2021

Fix in-place implementation of Gather op #573

Merged

guillaumekln added the bug Something isn't working label Oct 4, 2021

guillaumekln closed this as completed in #573 Oct 4, 2021

robertBrnnn mentioned this issue Jan 19, 2022

Can batch translation on CPU result in different output? #693

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Translations differ #546

Translations differ #546

anderleich commented Aug 25, 2021 •

edited

Loading

guillaumekln commented Aug 25, 2021

anderleich commented Aug 25, 2021

guillaumekln commented Aug 30, 2021 •

edited

Loading

BrightXiaoHan commented Sep 2, 2021

guillaumekln commented Sep 2, 2021 •

edited

Loading

anderleich commented Sep 6, 2021 •

edited

Loading

guillaumekln commented Sep 6, 2021

anderleich commented Sep 8, 2021

guillaumekln commented Sep 13, 2021

anderleich commented Sep 24, 2021

anderleich commented Oct 4, 2021

guillaumekln commented Oct 4, 2021 •

edited

Loading

anderleich commented Oct 4, 2021

guillaumekln commented Oct 4, 2021

anderleich commented Oct 4, 2021

guillaumekln commented Oct 4, 2021

anderleich commented Oct 4, 2021

Translations differ #546

Translations differ #546

Comments

anderleich commented Aug 25, 2021 • edited Loading

guillaumekln commented Aug 25, 2021

anderleich commented Aug 25, 2021

guillaumekln commented Aug 30, 2021 • edited Loading

BrightXiaoHan commented Sep 2, 2021

guillaumekln commented Sep 2, 2021 • edited Loading

anderleich commented Sep 6, 2021 • edited Loading

guillaumekln commented Sep 6, 2021

anderleich commented Sep 8, 2021

guillaumekln commented Sep 13, 2021

anderleich commented Sep 24, 2021

anderleich commented Oct 4, 2021

guillaumekln commented Oct 4, 2021 • edited Loading

anderleich commented Oct 4, 2021

guillaumekln commented Oct 4, 2021

anderleich commented Oct 4, 2021

guillaumekln commented Oct 4, 2021

anderleich commented Oct 4, 2021

anderleich commented Aug 25, 2021 •

edited

Loading

guillaumekln commented Aug 30, 2021 •

edited

Loading

guillaumekln commented Sep 2, 2021 •

edited

Loading

anderleich commented Sep 6, 2021 •

edited

Loading

guillaumekln commented Oct 4, 2021 •

edited

Loading