-
Notifications
You must be signed in to change notification settings - Fork 5
Add built-in GeoIP capabilities #247
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, it's definitely worth doing this downloading directly in the plugin server... and not inside a plugin. Some feedback though:
- I see it's a 60MB .mmdb file that we're downloading. Can't we download a lean
.gz
of a few megabytes instead? We can't imagine every user/server to be behind a fast connection. - Storing 60MB in Redis is way too much. Heroku's Free Redis maxes out at 25MB as we know. Even the first paid plan at $15/mo is 50MB. You'll have to pay $30/mo to get a redis that will fit this database. I don't think that's really an option.
- To be in line with all the other libraries that plugins can use (
posthog
,google
andfetch
mainly for now), thisgeoip
should not be passed inmeta
, but as a global. - What if the previously downloaded (and 7 day cached) .mmdb file expires... and somehow it's not possible to download it during the next restart of the plugin server? For example in an airgapped system. We should not just delete the previously downloaded file from the cache, but keep using it and just refresh it when it's stale.
- If you have a plugin server running for months without restarts (could happen), it would be nice to periodically refresh the maxmind database as well (e.g. check every day the md5sum of the downloaded file with what's in the update server... and update if needed).
- I think we shouldn't wait for a 60MB file to be downloaded when booting the server, but do it async. If a plugin the accesses this database, we could then
await
until the database is loaded? - Why the
geoip2-node
package and not themmdb-lib
pacakge the other plugin used?
|
New points:
|
As for 8. that would be very simple but I wouldn't be very comfortable with it, because the plugin server has to store the DB in memory ( |
Re: globals vs meta, OK with switching to a global here for consistency currently, but I think a followup PR should make all the extension also available in meta, Zapier |
Re 8, I don't mean this as a default option, but more that we could rebrand the existing "posthog maxmind plugin" to a "posthog custom maxmind database plugin" and have it share some code. Nothing will change in memory usage to people already using that. |
Re globals, I think this is a discussion for #242 I'm not immediately opposed to this, but I'm not convinced either. It's another breaking change for unclear benefits (yes, yes, better typing) that must be managed... and since I'm still not completely convinced that I'd still like a plugin to just look like this: function runEveryMinute () {
posthog.capture('ping')
} instead of: const plugin: Plugin = {
runEveryMinute (meta) {
meta.extensions.posthog.capture('ping')
}
}
module.exports = plugin |
I addressed all the feedback (aside from globals/meta until there are other PRs around that), added some cool mechanisms. For instance periodic background update: Graceful handling of airgapped instances (old MMDB is used or, if there's none, the server runs normally with just GeoIP disabled until download can be performed by the background update job): |
Also, https://github.com/PostHog/posthog-mmdb now automatically updates the DB from MaxMind servers every Monday, Wednesday, and Friday at 2:00 AM UTC thanks to GH Actions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and of course tests and merge conflicts... |
…r#247) * Add @maxmind/geoip2-node as dependency * Add Server.geoIp with downloaded and cached MMDB * Inject GeoIP extension into plugin meta * Upgrade plugin-scaffold * Disable MMDB fetching in tests * Update README.md * Fix "plugin meta has what it should have" * Fix imports * Use mmdb.posthog.net * Rework MMDB approach for plugin attachments and Brotli * Run prettier * Improve concurrency handling and restructure files * Add periodic in-flight MMDB staleness check with no-interrupt update * Polish up staleness check * Fix schedule * Gracefully handle airgapped systems * Update plugins.test.ts * Update README.md * Roll back unrelated config updates * Add tests * Fix plugin attachments in tests
Changes
This will allow us to efficiently run GeoIP for all users, by making it a built-in capability of the plugin server.
It works by downloading an MMDB binary blob from our new microservice (currently at https://posthog-mmdb.herokuapp.com - maybe we could put the DB in this open source repo, but I'm a bit wary of licensing, it may not be that broad for redistribution).
Then a plugin (like our new GeoIP built specifically for this: https://github.com/PostHog/posthog-plugin-geoip) can access this and efficiently get location data.
I tried putting all of this into the above new plugin, but considering the size of the database, complications in distributed download of the database, and complexity of dependencies, it's MUCH simpler to do this inside the plugin server.
Checklist