Skip to content
This repository has been archived by the owner on Nov 4, 2021. It is now read-only.

Add exportEvents function #408

Merged
merged 28 commits into from
May 27, 2021
Merged

Add exportEvents function #408

merged 28 commits into from
May 27, 2021

Conversation

mariusandra
Copy link
Collaborator

Changes

import { RetryError } from '@posthog/plugin-scaffold' // also a global

export async function exportEvents (events, { global, config }) {
    try {
        await fetch(`https://${config.host}/e`, {
            method: 'POST',
            body: JSON.stringify(events),
            headers: { 'Content-Type': 'application/json' },
        })
    } catch (error) {
         throw new RetryError() // ask to retry
    }
}
  • Closes Create exportEvents function in plugins that abstracts away all the queueing, batching, retrying logic #404
  • This PR adds support for exportEvents by using existing plugin server tools as building blocks.
  • That function receives a batch of events (configurable size) and either exports them (await fetch, etc) or throws RetryError. If the latter happens, we retry exporting the entire batch again.
  • Basically it abstracts away this code from an export plugin into a very similar structure inside the plugin server.
  • After the VM is initialized, if an exportEvents function is exported, this creates a buffer via createBuffer, and adds or patches an existing onEvent function to add incoming events to that buffer.
  • We also add a exportEventsWithRetry job to retry events.
  • The config for this export comes from meta.config. As a plugin author, you can ask these from your users, or stick to the defaults:
    • exportEventsBufferBytes (1MB default)
    • exportEventsBufferSeconds (10 sec default)
    • exportEventsToIgnore (empty default)
  • It might feel like a weird place for this, but it turns out to be a nice compromise between 1) having defaults, 2) having a chance for users to specify these fields for certain plugins, 3) not having some meta.config parsing boilerplate in setupPlugin.

Please rip and nit this to shreds :).

Tests mightwill still fail, will fix if so.

Checklist

  • Updated Settings section in README.md, if settings are affected
  • Jest tests

@mariusandra mariusandra requested a review from Twixes May 24, 2021 08:44
@mariusandra mariusandra requested a review from Twixes May 25, 2021 12:20
@mariusandra
Copy link
Collaborator Author

Some things to improve still:

  • Add statsd
  • I'd still like to support throw new RetryError() in onEvent, onSnapshot and perhaps also processEvent. In the future as well onRequest (incoming webhooks), onAction, etc.
  • I'm not happy with how onEvent gets overridden to a function that runs two functions (batch.add and onEvent). This would break the separate onEvent retry dynamic as it stands now.

@Twixes
Copy link
Member

Twixes commented May 25, 2021

Looks okay for now. Though merge conflict and red tests.

@mariusandra
Copy link
Collaborator Author

I added some statsd and made a new issue to track the other stuff: https://github.com/PostHog/plugin-server/issues/429

@Twixes
Copy link
Member

Twixes commented May 25, 2021

One more question: does the buffer get flushed when the plugins gets unloaded?

@mariusandra
Copy link
Collaborator Author

Very good question. Answer: yes. It does. Now.. :)

Comment on lines +27 to +29
jobs: {
exportEventsWithRetry: ExportEventsJobPayload
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit confusing to me: aren't jobs supposed to be callables?

Copy link
Collaborator Author

@mariusandra mariusandra May 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed they are, but the only thing you can actually configure is the payload of a job. So in an effort to reduce boilerplate and not give any wrong ideas, in this jobs: {} object you just specify the payload.

Basically, I think it's just nicer to write

    jobs: {
        exportEventsWithRetry: ExportEventsJobPayload
    }

instead of something with more boilerplate like:

    jobs: {
        exportEventsWithRetry: (payload: ExportEventsJobPayload) => Promise<void>
    }

@neilkakkar
Copy link
Contributor

A question about the general flow of data here:

meta.global is local to the VM, right? So, we have buffers on each thread which are being filled and flushed as and when events come in for those specific plugins on those specific threads.

This is separate from the runAt / runWhenever functionality which is redlocked.

So, it's entirely possible we're sending X requests together, where X is number of threads over all instances? We may have only 1 instance processing a job at a time, but that doesn't stop the buffer flushes on timeout, since this is separate from the jobs locking mechanism? *until it gets to a retry

(Correct me if I'm wrong somewhere please)

@mariusandra
Copy link
Collaborator Author

@neilkakkar indeed, meta.global is vm-specific. I've been thinking it might make sense to call this meta.shared instead, but not sure. And indeed, that means a buffer is per-thread, and in the worst case, every thread per server might make a request at the same time. In reality, this might just mean we need to increase the size of the buffers. In any case, it'll still be better than without a buffer, just something to keep in mind going forward :)

@neilkakkar
Copy link
Contributor

Agreed, just wanted to (1) ensure I understood correctly, and (2) make my concern explicit.

It's probably not worrisome right now, and definitely better to have a buffer than not, but as you say, something to keep in mind :)

Copy link
Contributor

@neilkakkar neilkakkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work!

@mariusandra mariusandra added the bump patch Bump patch version when this PR gets merged label May 27, 2021
@mariusandra mariusandra merged commit a00dd0a into master May 27, 2021
@mariusandra mariusandra deleted the export-events branch May 27, 2021 07:02
fuziontech pushed a commit to PostHog/posthog that referenced this pull request Oct 12, 2021
* export events via vm upgrade

* cleaner exportEvents upgrade, add RetryError

* add basic tests

* test more things, fix buffer length issue

* fix type

* add missing vm method

* add plugin scaffold to imports

* Use JSDoc style for tooltips and fix onEvent typing

* stringClamp

* remove dead code

* add consts

* log locally

* better log

* it's a hub now

* less events in benchmark to hopefully deflake a test

* fix type bug

* fix awkward bug

* add statsd for export event jobs

* typefix and rename

* fix ! in test

* flush on teardown

* config as a string, as it should

Co-authored-by: Michael Matloka <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bump patch Bump patch version when this PR gets merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create exportEvents function in plugins that abstracts away all the queueing, batching, retrying logic
3 participants