Skip to content
This repository has been archived by the owner on Nov 4, 2021. It is now read-only.

GeoIP via internal TCP #304

Merged
merged 12 commits into from
Apr 12, 2021
Merged

GeoIP via internal TCP #304

merged 12 commits into from
Apr 12, 2021

Conversation

Twixes
Copy link
Member

@Twixes Twixes commented Apr 9, 2021

Changes

This relocates the built-in MMDB feature to the main thread only, by running a lightweight internal TCP server responding to GeoIP requests on localhost. Resolves PostHog/posthog#3888.

Story

Initially I looked into sharing the mmdb (ReaderModel instance) object between threads somehow. Turns out that some raw data can be shared nicely with SharedArrayBuffer, but that's way too low-level to help here, database reader instantiation would nullify this optimization completely.

I stumbled upon the mmap-object package. It uses mmap to share actual JS objects between Node processes. I attempted implementing it in #301: had to fork the package and make some changes to make it even work (allenluce/mmap-object@master...Twixes:master). Memory usage did drop somewhat (anecdotal result: with 16 cores reduced memory usage from 2.3 GB to 1.7 GB), but not nearly as much as would be optimal – a proper solution would really be embedded in mmdb-lib instead of high-level in here, but that is not really feasible.

After that I searched for a way to use intraprocess MessagePorts, and it's a great way to communicate between main thread and worker threads, but the way Piscina works makes it unrealistic here (without significant upstream changes).

Then I considered passing the GeoIP data to EVERY event right before runPlugins, but

  1. that'd be pretty heavy handed, as this feature surey won't be used by everyone,
  2. it would, potentially confusingly, only work with the original IP and not with one added/changed by plugins earlier in the pipeline.

Having done all of the above, a local TCP (not even HTTP) server emerged as a workable solution. It's quite elegant and lightweight. I settled for port communication - a Unix socket could possibly be slightly more performant, but wanted to avoid creating a sock file. Accepted request data is one IP address string per socket.write. Response data is then sent per socket.write, until the connection is closed (in fact it's closed after one response). Initially response data format was a JSON string, but since this is purely internal (only localhost), actually Buffer-based and within a single plugin server instance, I switched to low-level v8.[de]serialize.

As with all solutions here, some downsides exist, though they should be menageable:

  • can't run multiple MMDB-enabled plugin server instances on one machine due to the port being blocked not a problem with a randomized port
  • TCP definitely has more overhead than doing this just by passing memory

Checklist

  • Updated Settings section in README.md, if settings are affected
  • Jest tests

@Twixes Twixes requested a review from mariusandra April 9, 2021 09:58
@Twixes Twixes added the bump patch Bump patch version when this PR gets merged label Apr 9, 2021
@mariusandra mariusandra enabled auto-merge (squash) April 12, 2021 14:30
@mariusandra mariusandra merged commit c496014 into master Apr 12, 2021
@mariusandra mariusandra deleted the mmdb-main-thread-only branch April 12, 2021 14:33
fuziontech pushed a commit to PostHog/posthog that referenced this pull request Oct 12, 2021
* Add INTERNAL_MMDB_SERVER_PORT setting

* Upgrade @maxmind/geoip2-node and @posthog/plugin-scaffold

* Rework MMDB feature to only run in the main thread with a TCP server

* Rework MMDB tests for TCP server

* Improve pluginsServer style

* Use random port MMDB server by default

* Clean server.ts up

* Simplify createMmdbServer return type

* Slightly increase mmdb.test.ts timeout
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bump patch Bump patch version when this PR gets merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

900Mb memory usage at idle
2 participants