Use SledStore to store full timeline #288

jsparber · 2021-06-28T16:32:29Z

This implements storing and retrieving the full timeline via the SledStore.

When a sync or message response is received the events are written to the store in the same direction (Backward).
The implementation merges consecutive batches as well as overlapping batches into a single batch. For each batch that isn't consecutive the store keeps track of the prev_batch and next_batch of the batch. If a disconnected batch (a response that isn't possible to be merge to an already known batch) is received a new batch is added to the store and the prev_batch and next_batch is tracked separately. It may be possible to merge the batches at a later point, but it would require moving the entire batch from one location to an other one inside the store which may be expensive, therefore I would keep them as distinct batches that can grow independently.

When reading events, the store automatically continues reading events from an other known batch whenever possible. Once the end of the stored timeline is reached, the found events are returned and if not all requested events are found a token is added to the response that can be used to retrieve more events from the homeserver.

Please note that this doesn't implement storing the timeline to the MemoryStore

Fixes: #138

jsparber · 2021-06-28T16:34:12Z

cc: @DevinR528

ShadowJonathan · 2021-06-28T17:54:42Z

While i like the idea, i'd love to know what exactly a good reason would be for the SDK to store the timeline locally.

Technically, the timeline on the server and the timeline "locally" can become out of sync once more DAG events converge and "slot" into historical timelines. When an author expects this to happen, they might want to back-paginate to collect all of these events (somehow) and display them appropriately. Or the author might expect what a server ought to do; display the correct topological ordering of the events on a request.

This is good for caching, but I don't think its good as a Source of Truth, as the historical timeline is subject to change once federation converges, and it's relatively easy for this to happen more than 50 events ago.

Even then, duplicate events are easy to have here as a server might "append" a converged event to a sync, which then gets persisted "late" into the timeline, and when the client might back-paginate to get more events, it might encounter both the persisted "late" variant of the event, and the "correct" variant the server might supply in that moment.

jsparber · 2021-06-29T07:54:28Z

While i like the idea, i'd love to know what exactly a good reason would be for the SDK to store the timeline locally.

Requesting events from the server can be slow. Especially for encrypted rooms we need a local stored timeline so that we don't need to decrypt events every time the user views older messages and there is a hard requirement for a local stored timeline for the search in encrypted rooms.

Honestly i forgot about about de-duplicating events. I will add it.

ShadowJonathan · 2021-06-29T08:18:13Z

Is there a way to only set the SDK to store some rooms locally? i.e only DMs, like you said.

jsparber · 2021-06-29T08:42:44Z

Is there a way to only set the SDK to store some rooms locally? i.e only DMs, like you said.

Currently no, it could be added at a later point, but i don't really see a good reason to not store the timeline.

matrix_sdk_base/src/store/sled_store/mod.rs

matrix_sdk_base/src/client.rs

matrix_sdk/src/room/common.rs

DevinR528 · 2021-07-01T12:05:58Z

Ugh sorry to clutter everything up with review comments instead of "request changes" reviews that you could actually close. if you what to leave a not on the comment I can delete it when you ready or whatever.

jsparber · 2021-07-01T12:14:10Z

Ugh sorry to clutter everything up with review comments instead of "request changes" reviews that you could actually close. if you what to leave a not on the comment I can delete it when you ready or whatever.

don't really know the difference :)

jsparber · 2021-07-16T09:53:52Z

i added now deduplication of events. And the removal of redacted events from the store. I think this is good now for a final review.

poljar

This seems mainly sensible, though a bit of a cleanup of the concepts and splitting up some methods seems like it could make this nicer.

matrix_sdk/src/room/common.rs

poljar · 2021-07-27T13:22:20Z

matrix_sdk_base/src/store/sled_store/mod.rs

+                                        })
+                                        .and_then(|e| e.event.deserialize().ok())
+                                    {
+                                        if let Ok(Some(room_version)) = self.get_room_version(room)


Do we care about this? Can't we just use the latest room version?

Not sure but redaction changes based on the room version and different values are stripped. From reading the specs it looks like only m.room.aliases has changed in version 6. Before version 6 it allowed to keep aliases after redaction. I think the room version doesn't have any implication for the store, but we may not strip the correct data.

matrix_sdk_base/src/store/sled_store/mod.rs

ftilde · 2021-11-10T20:13:40Z

After the previous discussion, I fear that with the current matrix spec and implementations, implementing a local message store is (at least almost) useless. I've tried to summarize my thoughts below. Of course, feel free to disagree! This is based on my current (possibly flawed) understanding of the ordering of messages returned by syncs and messages: As far as I understand there are no guarantees whatsoever about the ordering of one vs. the other.

Let the following be the "true" timeline, i.e. a sequence of events with the ordering decided by the server (after reordering, what would returned by messages at a time infinitely far in the future.)

most recent <-> older
A001..A500 B001..B500 C D001..D500

Then, let this be the ordering in which events arrive at (your) homeserver. Notice that event C is delayed significantly for some reason (temporary disconnection of the homeserver of the user who sent C or something like that).

most recent <-> older
A001..A500 C B001..B500 D001..D500

Now assume that your client is running when D500 arrives, but closed after receiving B001 via sync. B001-D500 are now in the local store. In particular, it did not receive C and is thus not in the store.

After A001..A500 C have arrived at the (your) homeserver, the client is restarted. Some events A001..A??? are recevied at the first sync. Events A???(+1)..A500 can be received via the sync token and calls to messages. After A500, B001..BXYX are returned via messages, but (due to the large number of messages between B001 and C) C is very likely not in the batch. What should the client do now? Since B001 is already in the store it probably does not make sense to query further messages. But then it never displays C! The only way to make sure to receive all messages is to re-fetch all messages until the beginning of the timeline. This behavior, however, would make storing the messages useless. (Or at least almost useless: The single benefit I can think of would be to have an offline copy of a once accurate timeline.)

@ShadowJonathan does this describe your concerns expressed above as well?

ShadowJonathan · 2021-11-10T20:20:16Z

Yes, as currently most of this behaviour is undefined in the spec, all homeservers take different approaches with the sync, and you cannot have a guaranteed ordering story like that. I saw that coming when I first saw this PR, so I still really recommend against merging it.

At most, an event-id -> content cache would work, not a timeline store like this.

agraven · 2021-11-11T07:38:39Z

While do agree that the raised issue is a concern to be considered, my main concern is that not merging this is going to lead to most client developers consuming this library implementing their own implementation of this feature, which in most cases won't be put through as much scrutiny as an implementation in the matrix-sdk itself, and additionally won't be able to draw on any of the benefits of access to internals that come from being part of the matrix-sdk crate itself.

ftilde · 2021-11-11T20:21:59Z

I think what could be implemented with the current state of spec and implementations would be an in memory cache of messages received via sync and `messages`. It would have to be invalidated/thrown away whenever a message in a sync is missed. That would happen due to a large number of messages arriving in a short period, which would be very unlikely, however, I think. Such a gap would be detected by comparing the start/end tokens of two successive syncs, which should be the same if no messages were missed.

ShadowJonathan · 2021-11-11T22:31:35Z

Keep in mind that messages can also be missed if the "limited"/"partial" flag was set (can't remember the exact name), which highlights a partial sync (for a particular room's timeline)

poljar · 2021-11-12T08:50:05Z

Such a gap would be detected by comparing the start/end tokens of two successive syncs, which should be the same if no messages were missed.

Keep in mind that messages can also be missed if the "limited"/"partial" flag was set (can't remember the exact name), which highlights a partial sync (for a particular room's timeline)

Those two are the same case, no? Either there is a gap and the limited flag is set and the tokens differ, or there isn't a gap in which case the limited flag is not set and the tokens match.

The TimelineSlice is a slice of the timeline of a room and contains the start and end of the slice.

Note: This doesn't implement the timeline store for the MemoryStore

ShadowJonathan · 2022-02-01T13:59:27Z

I believe https://github.com/matrix-org/matrix-doc/issues/3263 hasn't been mentioned yet, but it is relevant to this PR

jsparber · 2022-02-03T13:18:22Z

With the current limits of the sync api, I think we can only implement a cache for a continues stream of events of the timeline. This will make the cache a little less efficient, but unfortunately I don't see any way around this. On duplicated events the local cache is dropped, because it's the indication that the order of the timeline on the server has changed. Till the developer requests a new stream we will keep the local order of events, and thus, we can just drop the second occurrence of the event.

For now I'm not considering implementing any cache for requests directly to get_message_events, get_context and get_room_event. Even though i think it could be possible to serve some requests locally quite easily.

In specific this will solve the following issues we faced so far in this MR:

No gaps in the returned and stored timeline
No issue with duplicated events (since the second occurrence of the events can be dropped from the stream)
No overlapping batches, because we always use the proper token from the previous request

The local cache is invalidated on:

A sync response contains a limited timeline [1]
We found duplicated events in the previously started stream. This ensures that we get the new and proper order of events next time the stream is started (the previously started stream won't be affected)

This will still give us a local cache of the timeline for:

Timeline before receiving a sync response (after the initial sync)
Offline use
Quicker loading time on cache hit.
Developer doesn't need to mess around with batch token

For API, i think we could use DoubleEndedStream. It's still an unstable feature, therefore, I would implement it by using two Stream, one forward stream and one backward stream starting from the current point in the timeline.

[1] Considerations about filling the gap:
In future we could try to request events to fill the gap but we still need to invalidate the cache on duplicated events. At least, we wouldn't have any issue with overlapping batches since get_message_events allows us to set the end token.

ShadowJonathan · 2022-02-03T15:09:29Z

This looks promising, I want to point out that I'm very pleased with this conclusion, as initially I thought messing with the limitations was a non-starter, but delivering such a cache with the above promises seem to work, while also balancing it against developer and application ease, thanks a lot!

It looks like the force_auth is working correctly when restoring the session, but it's not working with the call to client.login, so it cannot be used. The access_token is not available in the login query, so if the option force_auth is set, this call will fail always. This patch just ignores the force_auth if there's no session, so any query will work when force_auth is true. Fix matrix-org#488

jsparber · 2022-02-14T18:07:29Z

closing in favor of #486

dkasak · 2022-08-09T11:48:40Z

~~@jsparber, you seem to have linked to the wrong thing here. Could you fix the link?~~ Nevermind, noticed the backref above and fixed it myself.

jsparber force-pushed the messages-api branch from 6ac6665 to e25b301 Compare June 28, 2021 16:41

jsparber force-pushed the messages-api branch from e25b301 to 3ee946a Compare June 29, 2021 08:48

DevinR528 reviewed Jun 29, 2021

View reviewed changes

matrix_sdk_base/src/store/sled_store/mod.rs Outdated Show resolved Hide resolved

matrix_sdk_base/src/store/sled_store/mod.rs Outdated Show resolved Hide resolved

matrix_sdk_base/src/client.rs Outdated Show resolved Hide resolved

DevinR528 reviewed Jun 29, 2021

View reviewed changes

matrix_sdk/src/room/common.rs Outdated Show resolved Hide resolved

jsparber force-pushed the messages-api branch 10 times, most recently from 9102f39 to e217d37 Compare July 8, 2021 09:34

jsparber force-pushed the messages-api branch 2 times, most recently from 36dee64 to de04124 Compare July 16, 2021 09:52

jsparber force-pushed the messages-api branch from de04124 to a92e2d1 Compare July 24, 2021 14:11

poljar requested changes Jul 27, 2021

View reviewed changes

jsparber force-pushed the messages-api branch 4 times, most recently from 185b688 to 6ea5d94 Compare August 4, 2021 14:05

jsparber force-pushed the messages-api branch 3 times, most recently from 11bff6e to a3ac0be Compare November 24, 2021 14:56

ftilde mentioned this pull request Dec 29, 2021

Decrypt messages from room::Common::messages() #448

Merged

jsparber force-pushed the messages-api branch 2 times, most recently from ee1fce7 to 7485848 Compare January 17, 2022 10:40

common: implement From<Raw<AnyRoomEvent>> for SyncRoomEvent

6d2ef61

jsparber force-pushed the messages-api branch 2 times, most recently from d72ddf3 to 116eddf Compare January 17, 2022 10:45

jsparber added 5 commits January 17, 2022 12:10

common: add TimelineSlice

dc0ff26

The TimelineSlice is a slice of the timeline of a room and contains the start and end of the slice.

common: Add method to get the event id from a SyncRoomEvent

fed3923

test: Add mocked responses need for timeline storage

93dc4e5

base: store timeline to SledStore

54e2fd0

Note: This doesn't implement the timeline store for the MemoryStore

base: Remove duplicated events from store instead of ignoring them

a4a50db

jsparber force-pushed the messages-api branch from 116eddf to a4a50db Compare January 17, 2022 11:14

jsparber mentioned this pull request Feb 8, 2022

Add two streams to get the timeline of a room and store timeline to SledStore #486

Merged

jsparber closed this Feb 14, 2022

dkasak mentioned this pull request Jul 12, 2023

Consider changing /messages to use stream ordering for events matrix-org/matrix-spec#852

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use SledStore to store full timeline #288

Use SledStore to store full timeline #288

jsparber commented Jun 28, 2021 •

edited

Loading

jsparber commented Jun 28, 2021

ShadowJonathan commented Jun 28, 2021

jsparber commented Jun 29, 2021 •

edited

Loading

ShadowJonathan commented Jun 29, 2021

jsparber commented Jun 29, 2021

DevinR528 commented Jul 1, 2021

jsparber commented Jul 1, 2021

jsparber commented Jul 16, 2021

poljar left a comment

poljar Jul 27, 2021

jsparber Aug 4, 2021 •

edited

Loading

ftilde commented Nov 10, 2021 •

edited

Loading

ShadowJonathan commented Nov 10, 2021

agraven commented Nov 11, 2021 •

edited

Loading

ftilde commented Nov 11, 2021 via email

ShadowJonathan commented Nov 11, 2021

poljar commented Nov 12, 2021

ShadowJonathan commented Feb 1, 2022

jsparber commented Feb 3, 2022

ShadowJonathan commented Feb 3, 2022

jsparber commented Feb 14, 2022 •

edited by dkasak

Loading

dkasak commented Aug 9, 2022 •

edited

Loading

Use SledStore to store full timeline #288

Use SledStore to store full timeline #288

Conversation

jsparber commented Jun 28, 2021 • edited Loading

jsparber commented Jun 28, 2021

ShadowJonathan commented Jun 28, 2021

jsparber commented Jun 29, 2021 • edited Loading

ShadowJonathan commented Jun 29, 2021

jsparber commented Jun 29, 2021

DevinR528 commented Jul 1, 2021

jsparber commented Jul 1, 2021

jsparber commented Jul 16, 2021

poljar left a comment

Choose a reason for hiding this comment

poljar Jul 27, 2021

Choose a reason for hiding this comment

jsparber Aug 4, 2021 • edited Loading

Choose a reason for hiding this comment

ftilde commented Nov 10, 2021 • edited Loading

ShadowJonathan commented Nov 10, 2021

agraven commented Nov 11, 2021 • edited Loading

ftilde commented Nov 11, 2021 via email

ShadowJonathan commented Nov 11, 2021

poljar commented Nov 12, 2021

ShadowJonathan commented Feb 1, 2022

jsparber commented Feb 3, 2022

ShadowJonathan commented Feb 3, 2022

jsparber commented Feb 14, 2022 • edited by dkasak Loading

dkasak commented Aug 9, 2022 • edited Loading

jsparber commented Jun 28, 2021 •

edited

Loading

jsparber commented Jun 29, 2021 •

edited

Loading

jsparber Aug 4, 2021 •

edited

Loading

ftilde commented Nov 10, 2021 •

edited

Loading

agraven commented Nov 11, 2021 •

edited

Loading

jsparber commented Feb 14, 2022 •

edited by dkasak

Loading

dkasak commented Aug 9, 2022 •

edited

Loading