Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chunking session recording events #3632

Closed
macobo opened this issue Mar 12, 2021 · 0 comments · Fixed by #3705
Closed

Chunking session recording events #3632

macobo opened this issue Mar 12, 2021 · 0 comments · Fixed by #3705
Labels
enhancement New feature or request

Comments

@macobo
Copy link
Contributor

macobo commented Mar 12, 2021

Is your feature request related to a problem?

Currently session recordings reach us via the following pipeline:

  1. posthog-js captures a bunch of events, compresses and sends to /s endpoint
  2. we decompress these events, put each into celery/kafka as a task
  3. In the other end plugin server reads these events and puts them into the appropriate database

To retrieve a list of recordings, we:
4. Fetch all session recording $session_ids in time range
5. Filter out those without a full page snapshot
6. Display the list

We are currently missing some full snapshot events regularly from step 2 because of kafka message size limits.

Describe the solution you'd like

Use compression and chunking to make sure we don't drop any events.

Some requirements:

  1. If we're dealing with session recording events, compress, chunk and send all the events instead of individually.
  2. The chunked data needs to contain information whether the event is a full_snapshot event or not
  3. Handle one row containing many session recording events in backend.
  4. Skip decompression when it's not needed. Instead, include information via the URL that this is a session recording request and if it contains a full snapshot.

Not sure if the last point is achievable due to api tokens, etc being within the compressed payload.

Describe alternatives you've considered

Additional context

https://github.com/PostHog/posthog/pull/3566/files got started with this process.

Thank you for your feature request – we love each and every one!

@macobo macobo added enhancement New feature or request session recording labels Mar 12, 2021
@macobo macobo changed the title Chunking session recording parts Chunking session recording events Mar 12, 2021
macobo added a commit that referenced this issue Mar 19, 2021
Closes #3632 and replaces https://github.com/PostHog/posthog/pull/3566/files

This should make it possible to ingest large full snapshot events

Base64 is used to compress the data for serialization purposes.

pytest.mock is used for clean patching methods
mariusandra pushed a commit that referenced this issue Mar 23, 2021
* Chunk session recording events

Closes #3632 and replaces https://github.com/PostHog/posthog/pull/3566/files

This should make it possible to ingest large full snapshot events

Base64 is used to compress the data for serialization purposes.

pytest.mock is used for clean patching methods

* Mock time.time for py3.7 compatibility

* Group captured $snapshot events by $session_id

* Don't chunk already chunked payloads
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant