Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

423 Locked error when creating a new component with languages and doing auto-translate #13345

Open
2 tasks done
matzeeable opened this issue Dec 19, 2024 · 20 comments
Open
2 tasks done

Comments

@matzeeable
Copy link

Describe the issue

The following describes a sequential process of creating a new component and uploading the main POT file with new languages and auto translating them. Additionally, you can see the output within our custom logs.

1.) We create a new component with POST /api/projects/(string: project)/components/

https://translate.example.de/api/tasks/my-uuid/ {
  completed: true,
  progress: 100,
  result: { component: 10063 },
  log: 'wordpress-real-cookie-banner-backend-php/fix-set-default-api-doc-version: rebase remote into repo b7600076b4c0975eb0c6bd90d4943afa75c74bae..b7600076b4c0975eb0c6bd90d4943afa75c74bae\n' +
    'wordpress-real-cookie-banner-backend-php/fix-set-default-api-doc-version: scheduling update in background'
}

✅ This is successful.

2.) We install an addon with POST /api/components/(string: project)/(string: component)/addons/

Install addon weblate.gettext.msgmerge in wordpress-real-cookie-banner-backend-php/fix-set-default-api-doc-version...
Installed addon weblate.gettext.msgmerge in wordpress-real-cookie-banner-backend-php/fix-set-default-api-doc-version

✅ This is successful.

3.) We create languages in the freshly created component with POST /api/components/(string: project)/(string: component)/translations/

Create missing language fr@formal in wordpress-real-cookie-banner-backend-php/fix-set-default-api-doc-version...
Create missing language de@informal in wordpress-real-cookie-banner-backend-php/fix-set-default-api-doc-version...
Create missing language de@formal in wordpress-real-cookie-banner-backend-php/fix-set-default-api-doc-version...
Create missing language it@formal in wordpress-real-cookie-banner-backend-php/fix-set-default-api-doc-version...
Create missing language pl@formal in wordpress-real-cookie-banner-backend-php/fix-set-default-api-doc-version...
Create missing language nl@informal in wordpress-real-cookie-banner-backend-php/fix-set-default-api-doc-version...
Create missing language nl@formal in wordpress-real-cookie-banner-backend-php/fix-set-default-api-doc-version...
Created missing languages

✅ This is successful.

4.) We upload the main POT file with POST /api/translations/(string: project)/(string: component)/(string: language)/file

Uploaded new source file: {
  not_found: 0,
  skipped: 0,
  accepted: 74,
  total: 74,
  result: true,
  count: 74
}

✅ This is successful.

5.) We auto translate the created languages with machine translation with POST /api/translations/(string: project)/(string: component)/(string: language)/autotranslate/

Auto translate fix-set-default-api-doc-version "fr@formal" (mode = translate, filter_type = todo, auto_source = mt, engines = deepl, component = , threshold = 90)...
Endpoint request failed: translations/wordpress-real-cookie-banner-backend-php/fix-set-default-api-doc-version/fr@formal/autotranslate/
Locked. Retrying in 5 seconds... (Attempt 1/5)
Endpoint request failed: translations/wordpress-real-cookie-banner-backend-php/fix-set-default-api-doc-version/fr@formal/autotranslate/
Locked. Retrying in 5 seconds... (Attempt 2/5)
Endpoint request failed: translations/wordpress-real-cookie-banner-backend-php/fix-set-default-api-doc-version/fr@formal/autotranslate/
Locked. Retrying in 5 seconds... (Attempt 3/5)
Endpoint request failed: translations/wordpress-real-cookie-banner-backend-php/fix-set-default-api-doc-version/fr@formal/autotranslate/
Locked. Retrying in 5 seconds... (Attempt 4/5)
Endpoint request failed: translations/wordpress-real-cookie-banner-backend-php/fix-set-default-api-doc-version/fr@formal/autotranslate/
Locked. Retrying in 5 seconds... (Attempt 5/5)
Endpoint request failed: translations/wordpress-real-cookie-banner-backend-php/fix-set-default-api-doc-version/fr@formal/autotranslate/
Max retries (5) reached. Giving up.

❌ The response is a 423 Locked error. All the requests to the REST API are sequential and never run in parallel. In our tests we also made sure that the component is not locked by user access.

But, I found this ticket: #4666, especially the comment #4666 (comment). Relevant code:

weblate/weblate/vcs/apps.py

Lines 104 to 116 in f595bb9

def post_migrate(self, sender: AppConfig, **kwargs) -> None:
ensure_ssh_key()
home = data_dir("home")
if not os.path.exists(home):
os.makedirs(home)
# Configure merge driver for Gettext PO
# We need to do this behind lock to avoid errors when servers
# start in parallel
lockfile = WeblateLock(
home, "gitlock", 0, "", "lock:{scope}", "{scope}", timeout=120
)

Is there a chance to find out what caused the lock? Why is it 120 seconds?

I already tried

  • I've read and searched the documentation.
  • I've searched for similar filed issues in this repository.

Steps to reproduce the behavior

See above.

Expected behavior

No lock error

Screenshots

No response

Exception traceback

No response

How do you run Weblate?

Docker container

Weblate versions

No response

Weblate deploy checks

No response

Additional context

No response

@nijel
Copy link
Member

nijel commented Dec 19, 2024

Weblate locks internally the component for some operations, for example to avoid concurrent manipulation with the files. Retrying later should work. You should be able to see what is going on in the server logs.

@matzeeable
Copy link
Author

matzeeable commented Dec 20, 2024

Command to get the log:

sudo docker logs 3dd8d9fc44a1 --since "2024-12-19T10:35:27+01:00" --until "2024-12-19T10:50:27+01:00" \
    | grep -vE "received$|this revision has been already parsed, skipping update$" \
    | grep -v "wordpress-real-cookie-banner-frontend-javascript" \
    | grep -v "wordpress-real-cookie-banner-wordpressorg-readme" \
    | grep -v "wordpress-real-media-library" \
    | grep -v "wordpress-real-physical-media" \
    | grep -v "wordpress-real-thumbnail-generator" \
    | grep -v "wordpress-real-category-management" \
    | grep -v "devowl-wp-utils" \
    | grep 'wordpress-real-cookie-banner-backend-php/fix-set-default-api-doc-version/fr@formal: starting automatic translation None: mt: deepl' -A 2000 --color -m 1

As you can see, there are other components (e.g. wordpress-real-media-library) updated concurrently, but this should not lead to any issues as the lock is at component-level, I guess.

This is our server log from the first /autotranslate request:

gunicorn stderr | [2024-12-19 10:38:42,950: INFO/20468] wordpress-real-cookie-banner-backend-php/fix-set-default-api-doc-version/fr@formal: starting automatic translation None: mt: deepl
gunicorn stderr | INFO:weblate:wordpress-real-cookie-banner-backend-php/fix-set-default-api-doc-version/fr@formal: starting automatic translation None: mt: deepl
nginx stdout | 127.0.0.1 - - [19/Dec/2024:10:38:47 +0100] "GET /healthz/ HTTP/1.1" 200 12 "-" "curl/7.88.1"
nginx stdout | 152.53.135.192 - - [19/Dec/2024:10:38:47 +0100] "POST /api/translations/wordpress-real-cookie-banner-backend-php/fix-set-default-api-doc-version/fr@formal/autotranslate/ HTTP/1.1" 423 77 "-" "axios/1.7.2"

But from previous calls (I do not know from which request, but I think when uploading the new source file) I can see some locks with Acquired Lock('lock:lock:repo:10052'). and Acquired Lock('lock:lock:repo:non').. Could they cause the issue?

Is there a chance to get the timeout of the lock with the Retry-After header? In terms of Continuous Localization, I think this would make sense, so our CI does not work x retries, instead just waits until the lock is free.

Another question: In the access logs, I see that some crawlers access the edit page of strings. Does this also lead to a lock?

For reference, I have found the following locks which have more than 5 seconds timeout:

self.lock = WeblateLock(
lock_path=os.path.dirname(base_path),
scope="repo",
key=component.pk if component else os.path.basename(base_path),
slug=os.path.basename(base_path),
file_template="{slug}.lock",
timeout=120,
)

return WeblateLock(
backup_dir, "backuplock", 0, "", "lock:{scope}", ".{scope}", timeout=120
)

weblate/weblate/vcs/apps.py

Lines 114 to 116 in f595bb9

lockfile = WeblateLock(
home, "gitlock", 0, "", "lock:{scope}", "{scope}", timeout=120
)

@nijel
Copy link
Member

nijel commented Dec 20, 2024

The timeout applies when acquiring the lock and waiting while other process holds the lock.

The locking happens on component or repository level, so when components share a single repository they will wait for a single lock when Weblate is working with the repository.

@matzeeable
Copy link
Author

Ok, we have this shared repository: https://translate.owlinfra.de/projects/shared-glossaries/real-cookie-banner/:

image

So, in this case, when e.g. WordPress Real Cookie Banner (Backend, PHP) and WordPress Real Cookie Banner (Frontend, JavaScript) call the /autotranslate API endpoint concurrently, they could lock each other?

@nijel
Copy link
Member

nijel commented Dec 27, 2024

Yes.

@matzeeable
Copy link
Author

This is only a glossary and is used for the DeepL support for glossaries (#10519). As this glossary is not affected by the e.g. /autotranslate route, I do not understand exactly why this is locked. In general, would it be possible to not lock glossaries at all?

This comment was marked as off-topic.

@github-actions github-actions bot added the wontfix Nobody will work on this. label Jan 12, 2025
@matzeeable

This comment was marked as off-topic.

@github-actions github-actions bot removed the wontfix Nobody will work on this. label Jan 13, 2025
@nijel
Copy link
Member

nijel commented Jan 14, 2025

As this glossary is not affected by the e.g. /autotranslate route, I do not understand exactly why this is locked.

I'm confused now, you get the locking when calling autotranslate, so how is the glossary not affected by it?

In general, would it be possible to not lock glossaries at all?

Locking is necessary to avoid concurrent operations on the underlying repository. For database operations, we're slowly progressing towards row level locking, but we're not yet fully there for some code paths.

@matzeeable
Copy link
Author

Yes, I am getting the locked error when I call the /autotranslate endpoint but not for the glossary. The glossary is shared with the component. We have this example scenario:

+--> shared to   - my-project
|                  - my-component 
|                    - de@formal      -> we are callling /autotranslate on this
|                - shared glossaries (project)
+----<<            - my-glossary (component)
                     - de@formal

Locking is necessary to avoid concurrent operations on the underlying repository.

But at this time, the Glossary is only used for read operation.

We have now added a workaround to just wait two minutes on an 423 error, but we are still running into the issue. Is there a chance to get the timeout of the lock with the Retry-After header?

nijel added a commit to nijel/weblate that referenced this issue Jan 21, 2025
Include scope and compoent in the error message so that it gives more
insight where the blocking operation is happening.

Issue WeblateOrg#13345
@nijel
Copy link
Member

nijel commented Jan 21, 2025

Why do you think it's the glossary that is being locked? There is no locking involved when reading glossaries to be used in DeepL.

There are two kinds of lock which can influence this:

  • Repository is locked when manipulating with it (commit, merge, ...). This can happen if pending changes are being written out by Weblate.
  • Component is locked for bulk edits like automatic translation or when parsing translation files.

#13606 will make Weblate tell in the error message what kind of lock is causing this error.

nijel added a commit to nijel/weblate that referenced this issue Jan 21, 2025
Include scope and compoent in the error message so that it gives more
insight where the blocking operation is happening.

Issue WeblateOrg#13345
nijel added a commit to nijel/weblate that referenced this issue Jan 21, 2025
Include scope and compoent in the error message so that it gives more
insight where the blocking operation is happening.

Issue WeblateOrg#13345
nijel added a commit to nijel/weblate that referenced this issue Jan 21, 2025
Include scope and compoent in the error message so that it gives more
insight where the blocking operation is happening.

Issue WeblateOrg#13345
@matzeeable
Copy link
Author

Why do you think it's the glossary that is being locked? There is no locking involved when reading glossaries to be used in DeepL.

Because of #13345 (comment), sorry, if I understood something wrong.

#13606 will make Weblate tell in the error message what kind of lock is causing this error.

Nice, thanks for your efforts!

This can happen if pending changes are being written out by Weblate.

So, if I understand correctly, the REST API could send a response already back to our CI pipeline even when the repository is not yet unlocked / commits are pending? If yes, would it be useful to have a something like a await_commit=true for the /autotranslate route so our CI can do subsequent /autotranslate requests for the other languages?

nijel added a commit that referenced this issue Jan 21, 2025
Include scope and compoent in the error message so that it gives more
insight where the blocking operation is happening.

Issue #13345
@nijel
Copy link
Member

nijel commented Jan 21, 2025

Ah, sorry, I misunderstood your question then.

You should be able to see in the server logs what is going on while you get this error. The committing might be it, but there might be a different reason as well.

@matzeeable
Copy link
Author

How can I test those changes? We are currently using docker. What are your thoughts about the Retry-After header?

@nijel
Copy link
Member

nijel commented Jan 22, 2025

We have no clue how long the lock will be held, so I don't see a reasonable way to produce the Retry-After header.

PS: We should really revisit locking. The single lock to prevent all consistency issues on component level is probably not a viable approach. I've created #13623 to track this task.

@matzeeable
Copy link
Author

If yes, would it be useful to have a something like a await_commit=true for the /autotranslate route so our CI can do subsequent /autotranslate requests for the other languages?

What do you think about this? This would ensure committing is done directly within the request and a response is only sent back when it has finished. I also found POST /api/projects/(string: project)/repository/ which I could use, but I guess it just "triggers" the commit and does not await it?

When I think about all this, I would bring in another question: Would it make sense to provide a REST API to "pause" commits so the CI job would look like this:

  1. Pause the repository commit mechanism
  2. Do all the /autotranslate mechanisms
  3. Commit manually via POST /api/projects/(string: project)/repository/ which continues the previous pause.

@nijel
Copy link
Member

nijel commented Jan 22, 2025

The commits are done only when needed. That currently translates to changing an already pending string (util #8770 is implemented).

But as mentioned before, it might be something different from committing in your case. Check server logs, what is actually happening at that time.

@nijel
Copy link
Member

nijel commented Jan 30, 2025

In case you rely on translation propagation, #13665 might have addressed the root cause of this issue.

@matzeeable
Copy link
Author

Do you have an ETA when #13665 will be released? I did not yet test it as I am running within Docker and there is no updated docker image for "unreleased" fixes/betas.

@nijel
Copy link
Member

nijel commented Feb 3, 2025

The bleeding tag should have that included, see https://docs.weblate.org/en/latest/admin/install/docker.html#choosing-docker-image-tag

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants