-
Notifications
You must be signed in to change notification settings - Fork 270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use SledStore to store full timeline #288
Conversation
cc: @DevinR528 |
While i like the idea, i'd love to know what exactly a good reason would be for the SDK to store the timeline locally. Technically, the timeline on the server and the timeline "locally" can become out of sync once more DAG events converge and "slot" into historical timelines. When an author expects this to happen, they might want to back-paginate to collect all of these events (somehow) and display them appropriately. Or the author might expect what a server ought to do; display the correct topological ordering of the events on a request. This is good for caching, but I don't think its good as a Source of Truth, as the historical timeline is subject to change once federation converges, and it's relatively easy for this to happen more than 50 events ago. Even then, duplicate events are easy to have here as a server might "append" a converged event to a sync, which then gets persisted "late" into the timeline, and when the client might back-paginate to get more events, it might encounter both the persisted "late" variant of the event, and the "correct" variant the server might supply in that moment. |
Requesting events from the server can be slow. Especially for encrypted rooms we need a local stored timeline so that we don't need to decrypt events every time the user views older messages and there is a hard requirement for a local stored timeline for the search in encrypted rooms. Honestly i forgot about about de-duplicating events. I will add it. |
Is there a way to only set the SDK to store some rooms locally? i.e only DMs, like you said. |
Currently no, it could be added at a later point, but i don't really see a good reason to not store the timeline. |
Ugh sorry to clutter everything up with review comments instead of "request changes" reviews that you could actually close. if you what to leave a not on the comment I can delete it when you ready or whatever. |
don't really know the difference :) |
9102f39
to
e217d37
Compare
36dee64
to
de04124
Compare
i added now deduplication of events. And the removal of redacted events from the store. I think this is good now for a final review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems mainly sensible, though a bit of a cleanup of the concepts and splitting up some methods seems like it could make this nicer.
}) | ||
.and_then(|e| e.event.deserialize().ok()) | ||
{ | ||
if let Ok(Some(room_version)) = self.get_room_version(room) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we care about this? Can't we just use the latest room version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure but redaction changes based on the room version and different values are stripped. From reading the specs it looks like only m.room.aliases
has changed in version 6. Before version 6 it allowed to keep aliases
after redaction. I think the room version doesn't have any implication for the store, but we may not strip the correct data.
185b688
to
6ea5d94
Compare
After the previous discussion, I fear that with the current matrix spec and implementations, implementing a local message store is (at least almost) useless. I've tried to summarize my thoughts below. Of course, feel free to disagree! This is based on my current (possibly flawed) understanding of the ordering of messages returned by syncs and Let the following be the "true" timeline, i.e. a sequence of events with the ordering decided by the server (after reordering, what would returned by
Then, let this be the ordering in which events arrive at (your) homeserver. Notice that event C is delayed significantly for some reason (temporary disconnection of the homeserver of the user who sent
Now assume that your client is running when D500 arrives, but closed after receiving B001 via sync. After @ShadowJonathan does this describe your concerns expressed above as well? |
Yes, as currently most of this behaviour is undefined in the spec, all homeservers take different approaches with the sync, and you cannot have a guaranteed ordering story like that. I saw that coming when I first saw this PR, so I still really recommend against merging it. At most, an event-id -> content cache would work, not a timeline store like this. |
While do agree that the raised issue is a concern to be considered, my main concern is that not merging this is going to lead to most client developers consuming this library implementing their own implementation of this feature, which in most cases won't be put through as much scrutiny as an implementation in the matrix-sdk itself, and additionally won't be able to draw on any of the benefits of access to internals that come from being part of the matrix-sdk crate itself. |
I think what could be implemented with the current state of spec and
implementations would be an in memory cache of messages received via
sync and `messages`. It would have to be invalidated/thrown away
whenever a message in a sync is missed. That would happen due to a large
number of messages arriving in a short period, which would be very
unlikely, however, I think. Such a gap would be detected by comparing
the start/end tokens of two successive syncs, which should be the same
if no messages were missed.
|
Keep in mind that messages can also be missed if the "limited"/"partial" flag was set (can't remember the exact name), which highlights a partial sync (for a particular room's timeline) |
Those two are the same case, no? Either there is a gap and the limited flag is set and the tokens differ, or there isn't a gap in which case the limited flag is not set and the tokens match. |
11bff6e
to
a3ac0be
Compare
ee1fce7
to
7485848
Compare
d72ddf3
to
116eddf
Compare
The TimelineSlice is a slice of the timeline of a room and contains the start and end of the slice.
Note: This doesn't implement the timeline store for the MemoryStore
116eddf
to
a4a50db
Compare
I believe https://github.com/matrix-org/matrix-doc/issues/3263 hasn't been mentioned yet, but it is relevant to this PR |
With the current limits of the sync api, I think we can only implement a cache for a continues stream of events of the timeline. This will make the cache a little less efficient, but unfortunately I don't see any way around this. On duplicated events the local cache is dropped, because it's the indication that the order of the timeline on the server has changed. Till the developer requests a new stream we will keep the local order of events, and thus, we can just drop the second occurrence of the event. For now I'm not considering implementing any cache for requests directly to get_message_events, get_context and get_room_event. Even though i think it could be possible to serve some requests locally quite easily. In specific this will solve the following issues we faced so far in this MR:
The local cache is invalidated on:
This will still give us a local cache of the timeline for:
For API, i think we could use DoubleEndedStream. It's still an unstable feature, therefore, I would implement it by using two Stream, one forward stream and one backward stream starting from the current point in the timeline. [1] Considerations about filling the gap: |
This looks promising, I want to point out that I'm very pleased with this conclusion, as initially I thought messing with the limitations was a non-starter, but delivering such a cache with the above promises seem to work, while also balancing it against developer and application ease, thanks a lot! |
It looks like the force_auth is working correctly when restoring the session, but it's not working with the call to client.login, so it cannot be used. The access_token is not available in the login query, so if the option force_auth is set, this call will fail always. This patch just ignores the force_auth if there's no session, so any query will work when force_auth is true. Fix matrix-org#488
closing in favor of #486 |
|
This implements storing and retrieving the full timeline via the
SledStore
.When a
sync
ormessage
response is received the events are written to the store in the same direction (Backward).The implementation merges consecutive batches as well as overlapping batches into a single batch. For each batch that isn't consecutive the store keeps track of the
prev_batch
andnext_batch
of the batch. If a disconnected batch (a response that isn't possible to be merge to an already known batch) is received a new batch is added to the store and theprev_batch
andnext_batch
is tracked separately. It may be possible to merge the batches at a later point, but it would require moving the entire batch from one location to an other one inside the store which may be expensive, therefore I would keep them as distinct batches that can grow independently.When reading events, the store automatically continues reading events from an other known batch whenever possible. Once the end of the stored timeline is reached, the found events are returned and if not all requested events are found a
token
is added to the response that can be used to retrieve more events from the homeserver.Please note that this doesn't implement storing the timeline to the
MemoryStore
Fixes: #138