Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New group consistency algorithm #6404

Merged
merged 1 commit into from
Jan 11, 2025
Merged

New group consistency algorithm #6404

merged 1 commit into from
Jan 11, 2025

Conversation

link2xt
Copy link
Collaborator

@link2xt link2xt commented Jan 4, 2025

This implements the basic group consistency algorithm described in #6401

It is already ready for review because all the tests pass. Removing tombstones after 60 days and a test for it will be another PR. Important note about handling inactive groups to avoid-readding members if someone restores old backup is at #6401 (comment)

There are some attempts to workaround the case when "member added" (
0a0e715) or "member removed" (1855f84) messages fail to be sent. I am removing them here, going to update add_timestamp or remove_timestamp locally and then do an attempt to send the message out. If the message is not sent because of missing peerstate or because the app is killed, this is the same as having the network lose the message. Especially in the case of removing multiple members by making multiple parallel JSON-RPC or FFI calls it seems better to update the timestamp locally and then send out local state than sending out multiple "member removed" messages which each remove one member but not all the others, not sure if it even worked properly in case the bot or the UI requests multiple member removal in parallel.

@link2xt link2xt force-pushed the link2xt/group-consistency branch from c10da92 to c66590b Compare January 4, 2025 21:33
@link2xt link2xt changed the base branch from main to link2xt/putnmwzpnlpp January 4, 2025 21:34
@link2xt link2xt force-pushed the link2xt/group-consistency branch from c66590b to 2c16e9b Compare January 4, 2025 21:36
Base automatically changed from link2xt/putnmwzpnlpp to main January 5, 2025 00:36
@link2xt link2xt force-pushed the link2xt/group-consistency branch 3 times, most recently from 4c7102e to 6cfccd5 Compare January 5, 2025 03:00
@link2xt link2xt force-pushed the link2xt/group-consistency branch from 39143a1 to f03bf1d Compare January 5, 2025 10:46
@link2xt link2xt force-pushed the link2xt/group-consistency branch from 6eb0571 to d7fe3d3 Compare January 6, 2025 05:37
@ell1e
Copy link

ell1e commented Jan 6, 2025

as it is not obvious how to avoid users restoring backup from adding members back.

Sorry for the uninformed input, I just find algorithm questions interesting. What about a timestamp, where if the backup is older than 24 hours then it might prompt other group members via an automated mail to try to get the updated member list? (This should probably be done in some way to avoid that being done repeatedly when the system time is wrong, however.)

@iequidoo
Copy link
Collaborator

iequidoo commented Jan 6, 2025

Sorry for the uninformed input, I just find algorithm questions interesting. What about a timestamp, where if the backup is older than 24 hours then it might prompt other group members via an automated mail to try to get the updated member list? (This should probably be done in some way to avoid that being done repeatedly when the system time is wrong, however.)

The problem is that in the given algorithm it's not possible to find out whose state for the given membership is actual because timestamps are forgotten after 60 days. We can't store timestamps forever, that wouldn't scale and isn't good for privacy (there should be some time limit after which it shouldn't be known for new members when a particular member was added or removed in the past). As for automated mails in general, currently there are not many such ones, we should avoid them to not reveal the user's network state, whether they restore a backup (as in your question) etc.

@ell1e
Copy link

ell1e commented Jan 8, 2025

The problem is that in the given algorithm it's not possible to find out whose state for the given membership is actual because timestamps are forgotten after 60 days.

In the backup the timestamp would still be known, wouldn't it? So if it's super old, I think the natural intuitive thing to do would be to query somebody if they have a state that is newer than 60 days back. Perhaps an alternative would be to mark your own state as possibly faulty and wait for the next message to update it (and until then avoid actions that would propagate your own faulty state to others). I hope this input is useful.

@link2xt
Copy link
Collaborator Author

link2xt commented Jan 8, 2025

@ell1e We have a solution idea at #6401 (comment), this is what I plan to implement unless some critical flaw is discovered in it.

I think the natural intuitive thing to do would be to query somebody if they have a state that is newer than 60 days back.

I generally don't expect this kind of solution that adds another message to the protocol to work in distributed systems, it is too similar to adding an "acknowledgment to acknowledgment" while trying to solve two generals problem. Such message itself may be lost, delayed, reordered, so we get even more edge cases: 1. Nobody has a new state because the group was inactive for 60 days so nobody replies. 2. Someone replies but super late or the message arrives late. 3. You restore backup and read such reply from IMAP archive which was new at that time but is outdated by now. 4. You get multiple replies but they are reordered so you get older reply first. 5. and so on.

@link2xt link2xt marked this pull request as ready for review January 8, 2025 09:08
@hpk42
Copy link
Contributor

hpk42 commented Jan 8, 2025

@ell1e We have a solution idea at #6401 (comment), this is what I plan to implement unless some critical flaw is discovered in it.

I think the natural intuitive thing to do would be to query somebody if they have a state that is newer than 60 days back.

I generally don't expect this kind of solution that adds another message to the protocol to work in distributed systems, it is too similar to adding an "acknowledgment to acknowledgment" while trying to solve two generals problem.

For those curious, "two generals" refers to https://en.m.wikipedia.org/wiki/Two_Generals'_Problem ... A fundamental consideration about synchronisation in distributed computing. Basically "adding another acknowledgement or request message" typically does not solve problems. There is an equivalent in human communication when people try to synchronize on an appointment clock time and place. Messages are read later or not at all, and there is then uncertainty on both sides because they fail to be sure in the sense that each participant is sure themselves and sure that the other side is sure about the appointment. What is helpful is a) a phone call, or other very low delay communication b) an opportunistic leap from one or the other side ("I'll be at the cafe at 4pm and stick around for an hour or two" can opportunically create synchronization in a single message at the risk of enjoying a cafe visit alone or meeting an interesting stranger or reading a book :)

In DC developer discussions it's usually enough to utter "two generals" when countering a network synchronisation protocol suggestion ... A related utterance is "synchronization is hard and expensive". Ten years ago "synchronization architectures" were on vogue also because of Blockchain-mania. A child from those times, Matrix messaging is modeled on top of synchronization while delta uses actor-networking (send and receive of messages). Delta generally uses a P2P networking model with stateless super-dumb transports whereas most other messengers, including supposedly P2P messengers like Briar, use a "centralized sync" model for group messaging.

We also looked at several state of the art research papers and other messenger implementations and were surprised we are basically dealing with an unsolved problem after two decades of P2P hype, money and r&d in that space.

@ell1e
Copy link

ell1e commented Jan 8, 2025

Thanks for explaining! I still feel like such a message could have some use if it's not the only mechanism relied upon (as in only used if it's really recent and useful), but I guess if it works without that and perhaps even better so, fair enough. And the argument that it may signal a backup restore to somebody watching seems somewhat compelling.

Copy link
Collaborator

@Hocuri Hocuri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm almost done reviewing, but have to leave for today

I tried to mark my comments as "blocking" = I think that this needs to be changed and "non-blocking" = I think that it would be nice to change this but if you disagree, it's fine.

An off-topic thought I just had: Only the bigger number of add_timestamp and remove_timestamp is actually important - it never makes any difference what value the lower number has. So, an equivalent algorithm would be to have a column change_timestamp (or just timestamp or last_change) and a column is_member.

src/chat.rs Outdated
Comment on lines 3640 to 3657
/// Removes a contact from the chat
/// my updating the `remove_timestamp`.
pub(crate) async fn remove_from_chat_contacts_table(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idea(non-blocking): What about putting the algorithm (i.e. the content of #6401) into a document and linking to it from remove_from_chat_contacts_table(), apply_group_changes(), etc? Alternatively, just linking to the issue?

@iequidoo
Copy link
Collaborator

iequidoo commented Jan 9, 2025

2. Will probably add it, but not going to trash no-op explicit additions/removals anymore.

+1

Especially in the case of removing multiple members by making multiple parallel JSON-RPC or FFI calls it seems better to update the timestamp locally and then send out local state than sending out multiple "member removed" messages which each remove one member but not all the others

Anyway we need Chat-Group-Member-Removed headers for removed members to let the remaining members display the proper info message, maybe we can include it several times?

@link2xt link2xt force-pushed the link2xt/group-consistency branch from 44ac0c8 to 94d81d1 Compare January 9, 2025 02:26
@link2xt link2xt force-pushed the link2xt/group-consistency branch from 28e198b to 3522b2a Compare January 9, 2025 15:01
@link2xt
Copy link
Collaborator Author

link2xt commented Jan 9, 2025

Anyway we need Chat-Group-Member-Removed headers for removed members to let the remaining members display the proper info message, maybe we can include it several times?

Having multiple additions/removals in one message would be nice, but is not compatible to existing DC clients, so definitely not something for this and follow-up PRs.

I also think it should not be done by including the same header multiple times. Including the same header several times is not good for header protection, current header protection draft does not expect this and general assumption is that headers other than Received are never repeated. The only other exception is Autocrypt-Gossip, but this never appears outside the encrypted payload so does not need protection as well. It's better not to introduce new headers that are repeated. There are libraries that manage headers as dictionaries/maps e.g. in Python and mess up the order, especially mailman is known for breaking DKIM because of this, and requiring that all clients correctly manage headers as multimap is adding another barrier for adoption of header protection.

@link2xt link2xt force-pushed the link2xt/group-consistency branch 3 times, most recently from 6fa1f87 to e2616c7 Compare January 10, 2025 02:55
src/chat.rs Outdated
FROM chats_contacts cc
LEFT JOIN contacts c
ON c.id=cc.contact_id
WHERE cc.chat_id=? AND cc.add_timestamp < cc.remove_timestamp
ORDER BY c.id=1, c.last_seen DESC, c.id DESC;",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe better sort by remove_timestamp? But then we shouldn't bump it unnecessarily in other places

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't matter much, in fact, we could just not sort it at all, whatever - we can discuss about this if and when we decide to show this info in the UI.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but we shouldn't bump remove_timestamp just so, otherwise we need to wait for 60 days (timestamp expiration period) so that the sorting is correct if we decide to sort by remove_timestamp. Reminding about this just in case

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do we bump remove_timestamp unnecessarily?

If we get another timestamp that says someone was removed recently, we need to update remove_timestamp so it is more effective.

I want to add 60-day expiration after this PR so will look into corner cases anyway, but if you already see some then comments are helpful.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do we bump remove_timestamp unnecessarily?

It seems that only in update_chat_contacts_table(), but AFAIU this function is only for compatibility with old Delta Chat and MUA, so maybe that's not critical.

@@ -5,4 +5,5 @@ Msg#11: info (Contact#Contact#Info): Member [email protected] added. [NOTICED][IN
Msg#12: info (Contact#Contact#Info): Member fiona ([email protected]) removed. [NOTICED][INFO]
Msg#13: bob (Contact#Contact#11): Member [email protected] added by bob ([email protected]). [FRESH][INFO]
Msg#14: Me (Contact#Contact#Self): You added member fiona ([email protected]). [INFO] o
Msg#15: bob (Contact#Contact#11): Member fiona ([email protected]) removed by bob ([email protected]). [FRESH][INFO]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks not good, maybe add such no-op messages as hidden=1?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same question as the removal of test_no_op_member_added_is_trash, see the answer above.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, but if fiona remains in the group, such a message looks like a false statement, maybe we should sort such "delayed" info messages above? We can use the removal timestamp instead of the "Date" header for sorting. For this we can e.g. return from apply_group_chages() not only the "better message" text, but also its timestamp.

@@ -3470,7 +3481,7 @@ pub async fn create_group_chat(

let chat_id = ChatId::new(u32::try_from(row_id)?);
if !is_contact_in_chat(context, chat_id, ContactId::SELF).await? {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, what is the purpose of this check? Does the result depend on protect somehow?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks unnecessary, could be removed I think.

Copy link
Collaborator

@Hocuri Hocuri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exciting, looking forward to seeing how this works in the real world!! I tested it manually on my phone with a very simple group of 3 members, and it worked.

@@ -3618,17 +3637,21 @@ pub(crate) async fn add_to_chat_contacts_table(
Ok(())
}

/// remove a contact from the chats_contact table
/// Removes a contact from the chat
/// my updating the `remove_timestamp`.
pub(crate) async fn remove_from_chat_contacts_table(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, yes, of course I meant remove_from_chat_members, remove_from_chat_in_db, or similar

This implements new group consistency algorithm described in
<#6401>

New `Chat-Group-Member-Timestamps` header is added
to send timestamps of member additions and removals.
Member is part of the chat if its addition timestamp
is greater or equal to the removal timestamp.
@link2xt link2xt force-pushed the link2xt/group-consistency branch from a11f958 to de63527 Compare January 11, 2025 07:55
@link2xt link2xt merged commit de63527 into main Jan 11, 2025
36 checks passed
@link2xt link2xt deleted the link2xt/group-consistency branch January 11, 2025 09:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants