-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
matrix.org can't join #libera:libera.chat #15115
Comments
You're right, we do log nonetheless, so the lack of a processed line is suspicious. Maybe the request is still ongoing, maybe the log context has been eaten, or quite possibly something else I haven't thought of. |
Since the last message, it's pretty clear that the requests were just getting stuck in a lineariser because there were earlier joins stuck in the critical section waiting for full-state (and the room is stuck in partial-state!). That has likely been fixed by upgrading Matrix.org to what has now been marked as v1.78.0rc1 (m.org was on an old version of Synapse that did not support joins during partial state, at the time of this issue). However the room is still stuck in partial state. This seems to be because:
We don't retry Some logs from the startup today:
502 Bad Gateway sounds like it's the loadbalancer or Cloudflare (if applicable) cutting us off after some time. I can't actually find that exact request — I'm not sure what FRRJ is buying us here: it's requesting the full state at a point in the room. This returns the same amount of data as a non-FRRJ I will look at the situation on Libera further to see whether there's any real reason for it taking so long. |
Possibly because of my conveniently-timed restart, the initial part of the resync went through earlier and it moved on to doing further However, there's a problem: the room state is too big (the response > 500 MiB) and it's hitting a response body size limit:
# As with /send_join, /state responses can be huge.
MAX_RESPONSE_SIZE = 500 * 1024 * 1024 → spun off as #15127 |
I have joined using Element Desktop 1.11.23 on Windows 7. |
Example 524 logs as seen from the client
|
|
@JeanPaulLucien This is not a client issue, no need to provide client logs |
@progval I know that this is not client issue. Element Desktop shows logs from server. |
Description
The CS API /join times out with a 524. The serverside doesn't seem to log anything sensible (looking at the event_creator_users worker).
It looks like we're not logging a response due to the client having disconnected (which seems strange; i thought we logged the response whatever). Meanwhile, something is silently wedging solid after looking up the alias - perhaps getting stuck on a lineariser lock for the room or something?
Steps to reproduce
Homeserver
matrix.org
Synapse Version
{"server_version":"1.77.0rc1 (b=matrix-org-hotfixes,19bb342763)","python_version":"3.8.12"}
Installation Method
pip (from PyPI)
Database
postgres
Workers
Multiple workers
Platform
linux
Configuration
No response
Relevant log output
Anything else that would be useful to know?
This is causing avoidable drama with Libera.
The text was updated successfully, but these errors were encountered: