Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Unable to join specific room over federation with any account associated with a server that suffered a data loss event #11433

Closed
274below opened this issue Nov 26, 2021 · 13 comments
Labels
X-Needs-Info This issue is blocked awaiting information from the reporter

Comments

@274below
Copy link

274below commented Nov 26, 2021

Description

A while ago, I setup two synapse instances: one for more "production" use cases, and one for testing various server configurations. These two servers federated with each other (and the broader network), and all was fine. In the "production" server, a user from the testing server was added to the primary channel, and given admin rights over the room (priv 100).

Then the development server was deleted without having the corresponding account leave the room first. After having that server rebuilt, I'm now stuck in something of a predicament:

  • I can't kick the account from the room as it's at an equal privilege level as the other highest accounts in the room (100)
  • I can't re-invite the account to the room as it's still in the room as far as the production server is concerned
  • I can't /join the room due to missing auth events ("Failed to join room", "Auth events cannot be found")

This also has the unique end result of me being unable to join any accounts from the development server to the same channel on the production server. When I try, synapse tries to insert a row into local_current_membership referring to the account that's still "in" the channel, but with a null value for the membership column. This results in the row being rejected, and an Internal Server Error being thrown back to the client.

Steps to reproduce

  • Create a production.com synapse instance, as well as #room:production.com which is owned by @ owner:production.com (priv 100)
  • Create a development.com synapse instance, as well as @ userA:development.com
  • Invite @ userA:development.com to #room:production.com and grant it admin rights to the room (priv 100)
  • Have @ userA:development.com accept the invite and join the room
  • Stop the development.com synapse instance, delete and recreate both the database and the signing key, start the development.com instance again
  • Create @ userB:development.com and try to join #room:production.com
    • This will fail with Internal Server Error due to the development.com instance trying to insert a row that violates the database constraints
    • Observe that the row that is trying to be added actually refers to @ userA:development.com (the original account that joined the room) and not @ userB:development.com (the account that has never joined the room before, but now that it's trying, cannot)
  • Create @ userA:development.com again and try everything you can think of to get it either joined back or kicked from #room:production.com before you file a bug report

Version information

  • Homeserver:

If not matrix.org:

  • Version: 1.47.1

  • Install method: pip

  • Platform: Fedora 35, python 3.10.0
@274below
Copy link
Author

A few thoughts on this:

  • I decided on opening this bug report due to the fact that all accounts on development.com are prevented from joining the room going forward.
  • I've read every doc I could find when it comes to how to remove users from the rooms, policies around data deletion and account removal, etc. I generally understand why accounts are deactivated, and not deleted. With that said, there needs to be some way to forcibly kick equal-privilege accounts from rooms.
  • Deleting the room wholesale is not something that I am really interested in doing just to resolve this, but I also can't think of any other way to resolve this with the tools that I have available to me.
  • I'm not opposed to manually editing the development.com database to get this resolved, but in an actual data loss event in production that involves any non-zero amount of federated communication, the state of the state here just isn't good. (I'm also skeptical if any amount of database editing could solve this, given that the signing key was also lost.)
  • Due to this, I can no longer join #homeservers:matrix.org with my development account, presumably because it's trying to insert the same bad row into it's database when I try to rejoin :)

@274below 274below changed the title Unable to join room over federation with any account due to data loss event Unable to join specific room over federation with any account associated with a server that suffered a to data loss event Nov 26, 2021
@274below 274below changed the title Unable to join specific room over federation with any account associated with a server that suffered a to data loss event Unable to join specific room over federation with any account associated with a server that suffered a data loss event Nov 26, 2021
@274below
Copy link
Author

@babolivier
Copy link
Contributor

Hey, and thanks for the report!

My theory is that while joining a room the development server realises that it's missing a membership from @userA:development.com in local_current_membership and tries to insert it. However, the query it uses to do so tries to pull the membership from the room_memberships table, which after this kind of data loss obviously doesn't have a record of userA's membership (and thus Synapse attempts to insert NULL as the membership). I think the fix would be to just pull in the membership information from the event directly, but I'll double check with the team first to make sure there isn't a weird yet valid reason why this is not currently done this way.

With that said, there needs to be some way to forcibly kick equal-privilege accounts from rooms.

This is a very valid point, however this is related to the Matrix spec (which Synapse implements) rather than Synapse's implementation itself, so I'd suggest opening an issue about it (if there isn't already one) on https://github.com/matrix-org/matrix-doc, which is where the spec lives.

@babolivier
Copy link
Contributor

Hmm, looks like my first assessment wasn't correct and I misread that bit of code.

What's really weird here is that in order to reach this query Synapse would need to have persisted the membership event, and we'd expect to see it in the room_memberships event in this case.

Could you look in the events, state_events, current_state_events, event_json and room_memberships to see if the membership event's in there? You should be able to do that for each table with SELECT * FROM [table] WHERE event_id = '$wkga3wQq3kKsgQIfUoIZ-XrLTuxjKlDijHLEFTTTYEM';.

@babolivier babolivier added the X-Needs-Info This issue is blocked awaiting information from the reporter label Nov 26, 2021
@274below
Copy link
Author

This is a very valid point, however this is related to the Matrix spec (which Synapse implements) rather than Synapse's implementation itself, so I'd suggest opening an issue about it (if there isn't already one) on https://github.com/matrix-org/matrix-doc, which is where the spec lives.

Will do, thanks!

Could you look in the events, state_events, current_state_events, event_json and room_memberships to see if the membership event's in there?

That didn't return any rows, so I dropped the database again and re-tested in case it was any testing I did that broke it, and... still no rows. The new event ID that I pulled from the new stack trace is $rUeQzpPF0chtsxQmUNJj-piIJCIba1i6pG1z7Qq7Rnk, which is what you see below.

development=# SELECT * FROM events WHERE event_id = '$rUeQzpPF0chtsxQmUNJj-piIJCIba1i6pG1z7Qq7Rnk';
 topological_ordering | event_id | type | room_id | content | unrecognized_keys | processed | outlier | depth | origin_server_ts | received_ts | sender | contains_url | instance_name | stream_ordering
----------------------+----------+------+---------+---------+-------------------+-----------+---------+-------+------------------+-------------+--------+--------------+---------------+-----------------
(0 rows)

development=# SELECT * FROM state_events WHERE event_id = '$rUeQzpPF0chtsxQmUNJj-piIJCIba1i6pG1z7Qq7Rnk';
 event_id | room_id | type | state_key | prev_state
----------+---------+------+-----------+------------
(0 rows)

development=# SELECT * FROM current_state_events WHERE event_id = '$rUeQzpPF0chtsxQmUNJj-piIJCIba1i6pG1z7Qq7Rnk';
 event_id | room_id | type | state_key | membership
----------+---------+------+-----------+------------
(0 rows)

development=# SELECT * FROM event_json WHERE event_id = '$rUeQzpPF0chtsxQmUNJj-piIJCIba1i6pG1z7Qq7Rnk';
 event_id | room_id | internal_metadata | json | format_version
----------+---------+-------------------+------+----------------
(0 rows)

development=# SELECT * FROM room_memberships WHERE event_id = '$rUeQzpPF0chtsxQmUNJj-piIJCIba1i6pG1z7Qq7Rnk';
 event_id | user_id | sender | room_id | membership | forgotten | display_name | avatar_url
----------+---------+--------+---------+------------+-----------+--------------+------------
(0 rows)

Finally, I'm wondering if I should open a second bug, as I am actually describing two different problems. One is that any other users from development.com can no longer join #room:production.com (the stack trace that I provided), and the second is that @userA:development.com can no longer join #room:production.com due to missing auth events. Should I open a second bug?

@Seirdy
Copy link

Seirdy commented Nov 26, 2021

Finally, I'm wondering if I should open a second bug, as I am actually describing two different problems. One is that any other users from development.com can no longer join #room:production.com (the stack trace that I provided), and the second is that @userA:development.com can no longer join #room:production.com due to missing auth events. Should I open a second bug?

I have encountered the exact same behavior but the cause was slightly different: I was the only member of my homeserver to join a room and I was banned and then unbanned. Now nobody in the homeserver can join due to missing auth events.

@aaronraimist
Copy link
Contributor

The joining issue sounds like #11373 which is fixed in v1.48.0rc1 if you want to try that

@cremesk
Copy link
Contributor

cremesk commented Nov 27, 2021

with v1.48.0rc1 is see this error:

2021-11-27 09:58:24,896 - synapse.http.server - 97 - ERROR - POST-2657 - Failed handle request via 'ReplicationFederationSendEventsRestServlet': <SynapseRequest at 0x7f0fd11d8a00 method='POST' uri='/_synapse/replication/fed_send_events/FKwBeYUdmo' clientproto='HTTP/1.1' site='9094'>                          [26/88052]
Traceback (most recent call last):                                                                                                                                                                                                             
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/twisted/internet/defer.py", line 1657, in _inlineCallbacks                                                                                                                       
    result = current_context.run(                                                                                                                                                                                                              
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/twisted/python/failure.py", line 500, in throwExceptionIntoGenerator                                                                                                             
    return g.throw(self.type, self.value, self.tb)                                                                                                                                                                                             
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/synapse/replication/http/federation.py", line 134, in _handle_request                                                                                                            
    max_stream_id = await self.federation_event_handler.persist_events_and_notify(                                                                                                                                                             
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/synapse/handlers/federation_event.py", line 1861, in persist_events_and_notify                                                                                                   
    events, max_stream_token = await self._storage.persistence.persist_events(                                                                                                                                                                 
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/synapse/logging/opentracing.py", line 785, in _trace_inner                                                                                                                       
    return await func(*args, **kwargs)                                                                                                                                                                                                         
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/synapse/storage/persist_events.py", line 326, in persist_events                                                                                                                  
    ret_vals = await yieldable_gather_results(enqueue, partitioned.items())                                                                                                                                                                    
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/synapse/storage/persist_events.py", line 243, in handle_queue_loop                                                                                                               
    ret = await self._per_item_callback(                                                                                                                                                                                                       
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/synapse/storage/persist_events.py", line 581, in _persist_event_batch                                                                                                            
    await self.persist_events_store._persist_events_and_state_updates(                                                                                                                                                                         
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/synapse/storage/databases/main/events.py", line 175, in _persist_events_and_state_updates                                                                                                                                                                        
    await self.db_pool.runInteraction(                                                                                 
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/synapse/storage/database.py", line 699, in runInteraction                                                                                                                        
    result = await self.runWithConnection(                                                                                                                                                                                                     
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/synapse/storage/database.py", line 804, in runWithConnection                                                                                                                                                                                                     
    return await make_deferred_yieldable(                                                                                                                                                                                                      
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/twisted/python/threadpool.py", line 238, in inContext                                                                                                                            
    result = inContext.theWork()  # type: ignore[attr-defined]                                                                                                                                                                                 
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/twisted/python/threadpool.py", line 254, in <lambda>                                                                                                                             
    inContext.theWork = lambda: context.call(  # type: ignore[attr-defined]                                                                                                                                                                    
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/twisted/python/context.py", line 118, in callWithContext                                         
    return self.currentContext().callWithContext(ctx, func, *args, **kw)                                                                                                                                                                       
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/twisted/python/context.py", line 83, in callWithContext                                          
    return func(*args, **kw)                                                                                           
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/twisted/enterprise/adbapi.py", line 293, in _runWithConnection                                                                                                                                                                                                   
    compat.reraise(excValue, excTraceback)                                                                                                                                                                                                     
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/twisted/python/deprecate.py", line 298, in deprecatedFunction                                                                                                                                                                                                    
    return function(*args, **kwargs)                                                                                   
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/twisted/python/compat.py", line 404, in reraise                                                  
    raise exception.with_traceback(traceback)                                                                                                                                                                                                  
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/twisted/enterprise/adbapi.py", line 284, in _runWithConnection                                                                                                                                                                                                   
    result = func(conn, *args, **kw)                                                                                                                                                                                                           
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/synapse/storage/database.py", line 799, in inner_func                                            
    return func(db_conn, *args, **kwargs)                                                                                                                                                                                                      
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/synapse/storage/database.py", line 567, in new_transaction                                                                                                                                                                                                       
    r = func(cursor, *args, **kwargs)                                                                                                                                                                                                          
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/synapse/logging/utils.py", line 73, in wrapped                                                   
    return f(*args, **kwargs)                                                                                          
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/synapse/storage/databases/main/events.py", line 397, in _persist_events_txn                                                                                                                                                                                      
    self._update_metadata_tables_txn(                                                                                  
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/synapse/storage/databases/main/events.py", line 1488, in _update_metadata_tables_txn                                                                                                                                                                             
    self._store_retention_policy_for_room_txn(txn, event)                      
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/synapse/storage/databases/main/events.py", line 1918, in _store_retention_policy_for_room_txn                                                                                                                                                                    
    self.db_pool.simple_insert_txn(                                            
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/synapse/storage/database.py", line 892, in simple_insert_txn                                                                                                                                                                                                     
    txn.execute(sql, vals)                                                     
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/synapse/storage/database.py", line 307, in execute                                               
    self._do_execute(self.txn.execute, sql, *args)                             
  File "/opt/venvs/matrix-synapse/lib/python3.9/site-packages/synapse/storage/database.py", line 340, in _do_execute                                           
    return func(sql, *args)                                                    
psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "room_retention_pkey"                                                          
DETAIL:  Key (room_id, event_id)=(!SOPFfiEojgdYhVcoaL:matrix.org, $e1bQrYY4-Hh36Vrc-r2kNqv86x3JNhNATb4j959MT3E) already exists.

@cremesk
Copy link
Contributor

cremesk commented Nov 29, 2021

it will probably be fixed in #11440 .

@richvdh
Copy link
Member

richvdh commented Nov 29, 2021

the "Auth events cannot be found" error on join does sound like #11440. I don't think that will solve duplicate key value violates unique constraint "room_retention_pkey" errors, but that sounds like a separate problem (possibly specific to rooms with retention enabled) which should probably be filed as a separate issue.

@cremesk
Copy link
Contributor

cremesk commented Nov 29, 2021

the "Auth events cannot be found" error on join does sound like #11440. I don't think that will solve duplicate key value violates unique constraint "room_retention_pkey" errors, but that sounds like a separate problem (possibly specific to rooms with retention enabled) which should probably be filed as a separate issue.

oh thanks for the clarification!

@274below
Copy link
Author

The joining issue sounds like #11373 which is fixed in v1.48.0rc1 if you want to try that

All right, well, that actually resolved my overall issue so this can probably be closed. Once I installed 1.48.0rc1 I was able to properly rejoin the room with @userA:development.com, which resolved the database inconsistency and allowed the other accounts from the development.com instance to join the #room:production.com room.

So this whole thing likely stems from #11373 and 1.48.0rc1 resolved it for me. Thanks!

@babolivier
Copy link
Contributor

Ah that makes sense indeed, thanks @aaronraimist for catching it!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
X-Needs-Info This issue is blocked awaiting information from the reporter
Projects
None yet
Development

No branches or pull requests

6 participants