You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently session recordings reach us via the following pipeline:
posthog-js captures a bunch of events, compresses and sends to /s endpoint
we decompress these events, put each into celery/kafka as a task
In the other end plugin server reads these events and puts them into the appropriate database
To retrieve a list of recordings, we:
4. Fetch all session recording $session_ids in time range
5. Filter out those without a full page snapshot
6. Display the list
We are currently missing some full snapshot events regularly from step 2 because of kafka message size limits.
Describe the solution you'd like
Use compression and chunking to make sure we don't drop any events.
Some requirements:
If we're dealing with session recording events, compress, chunk and send all the events instead of individually.
The chunked data needs to contain information whether the event is a full_snapshot event or not
Handle one row containing many session recording events in backend.
Skip decompression when it's not needed. Instead, include information via the URL that this is a session recording request and if it contains a full snapshot.
Not sure if the last point is achievable due to api tokens, etc being within the compressed payload.
Closes#3632 and replaces https://github.com/PostHog/posthog/pull/3566/files
This should make it possible to ingest large full snapshot events
Base64 is used to compress the data for serialization purposes.
pytest.mock is used for clean patching methods
* Chunk session recording events
Closes#3632 and replaces https://github.com/PostHog/posthog/pull/3566/files
This should make it possible to ingest large full snapshot events
Base64 is used to compress the data for serialization purposes.
pytest.mock is used for clean patching methods
* Mock time.time for py3.7 compatibility
* Group captured $snapshot events by $session_id
* Don't chunk already chunked payloads
Is your feature request related to a problem?
Currently session recordings reach us via the following pipeline:
To retrieve a list of recordings, we:
4. Fetch all session recording $session_ids in time range
5. Filter out those without a full page snapshot
6. Display the list
We are currently missing some full snapshot events regularly from step 2 because of kafka message size limits.
Describe the solution you'd like
Use compression and chunking to make sure we don't drop any events.
Some requirements:
Not sure if the last point is achievable due to api tokens, etc being within the compressed payload.
Describe alternatives you've considered
Additional context
https://github.com/PostHog/posthog/pull/3566/files got started with this process.
Thank you for your feature request – we love each and every one!
The text was updated successfully, but these errors were encountered: