-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Download the geoip databases only when needed #92335
Download the geoip databases only when needed #92335
Conversation
Hi @masseyke, I've created a changelog YAML for you. |
Hi @masseyke, I've updated the changelog YAML for you. |
…b.com:masseyke/elasticsearch into fix/download-geoip-databases-only-when-needed
@elasticmachine update branch |
@elasticmachine update branch |
Pinging @elastic/es-data-management (Team:Data Management) |
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass, I think we've identified a couple of potential leaks and un/likely concurrency problems pertaining to the cluster service usage. Posting the comment review for now to get the ideas in writing. Will continue to review.
@@ -125,6 +125,8 @@ protected GeoIpDownloader createTask( | |||
parentTaskId, | |||
headers | |||
); | |||
clusterService.addListener(downloader); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should update the GeoIpDownloader to remove itself from the cluster listeners when it is cancelled.
clusterService.getClusterSettings().addSettingsUpdateConsumer(EAGER_DOWNLOAD_SETTING, this::setEagerDownload); | ||
clusterService.getClusterSettings().addSettingsUpdateConsumer(POLL_INTERVAL_SETTING, this::setPollInterval); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We talked about this outside of the PR messaging, but we both think this could lead to some messy leaks.
Every time this task is allocated to a node, these settings listeners are being installed. There is no way to remove them and it doesn't look like they correctly check if the task is cancelled before rescheduling the background logic when they change. We should definitely protect against multiple downloaders running when updating either of these settings.
We could add a check to the update methods to make sure the task is active, but this still has the possibility of leaving a number of leaked task instances in the cluster service on any nodes that have previously cancelled them. We agreed that it would be better if we added a settings container of some sort that keeps an instance to the active downloader task on a node and calls it during updates if it exists. The settings container is light weight and can stick around without too much overhead since there will only ever be one of them, but the downloaders can come and go without leaking multiple update consumers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The summary of another offline conversation:
It's just a bad idea to have these relatively-short-lived objects like GeoIpDownloader be settings or cluster state listeners because:
- You have to be careful to un-listen when you're done so that you don't wind up with leaks
- You have increased risk of race conditions since the condition you are listening for could happen in between the time the object is created and the time it is registered as a listener (not to mention when its predecessor is un-registered).
So I've moved all listeners for settings and cluster state changes into the executor. The task gets the current values for these things from the executor when it needs them, and the executor handles telling the task when it needs to reschedule because of a change in a dynamic property.
Co-authored-by: James Baiera <[email protected]>
…avoid race conditions
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything is looking really great, thanks for iterating on the listener registration woes. We sync'd up offline, but there's definitely an issue with the processor detection logic. I think once that is squared away we're pretty much there!
@@ -85,8 +85,9 @@ public class IngestGeoIpPlugin extends Plugin implements IngestPlugin, SystemInd | |||
public List<Setting<?>> getSettings() { | |||
return Arrays.asList( | |||
CACHE_SIZE, | |||
GeoIpDownloaderTaskExecutor.EAGER_DOWNLOAD_SETTING, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Can we tidy these up so that they are all ordered?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Done.
} | ||
|
||
if (event.metadataChanged() && event.changedCustomMetadataSet().contains(IngestMetadata.TYPE)) { | ||
boolean newAtLeastOneGeoipProcessor = hasAtLeastOneGeoipProcessor(event.state()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, it feels a little weird that we potentially calculate this a second time when bootstrapping. Probably not a performance bug but it does look a little funny. Especially since we calculate it with a different state call here than earlier in the same invocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it will usually happen twice will it? It happens on bootstrap, and then happens below only if the cluster state change includes an ingest metadata change (so only if someone modifies a pipeline definition), right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, I hadn't thought about that. Looks fine to me then.
} else { | ||
stopTask(() -> clusterService.addListener(this)); | ||
if (taskIsBootstrapped.getAndSet(true) == false) { | ||
this.atLeastOneGeoipProcessor = hasAtLeastOneGeoipProcessor(clusterService.state()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming that the comment about running this line twice on bootstrap isn't a big problem - should this be using event.state()
instead of clusterService.state()
for consistency? I don't think anything breaks as is, but my gut leans toward consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah might as well be consistent. Done.
} | ||
|
||
if (event.metadataChanged() && event.changedCustomMetadataSet().contains(IngestMetadata.TYPE)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
Map<String, Object> pipelineMap = pipelineDefinition.getConfigAsMap(); | ||
List<Map<String, Object>> processors = (List<Map<String, Object>>) pipelineMap.get(Pipeline.PROCESSORS_KEY); | ||
if (processors != null) { | ||
return processors.stream().anyMatch(processor -> processor.containsKey(GeoIpProcessor.TYPE)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just realized that this might technically be too simple of a check. Annoyingly, processors can supply on_failure
fields that contain processors, which can supply on_failure
fields of their own, and I don't think we have any restriction on depth nor processor type allowed. The foreach
processor also has a nested processor. Not sure if there are any others off the top of my head.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow good catch. I just changed this check to be recursive for those two things. I don't know of any others either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests are looking pretty good though! Just some additional corrections for ya!
...est-geoip/src/internalClusterTest/java/org/elasticsearch/ingest/geoip/GeoIpDownloaderIT.java
Outdated
Show resolved
Hide resolved
...est-geoip/src/internalClusterTest/java/org/elasticsearch/ingest/geoip/GeoIpDownloaderIT.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Thanks again for iterating on this! The changes look really great!
This commit changes the geoip downloader so that we only download the geoip databases if you have at least one geoip processor in your cluster, or when you add a new geoip processor (or if `ingest.geoip.downloader.eager.download` is explicitly set to true).
We currently download the geoip databases (used by the geoip processor) whether you need them or not. This PR changes it so that we only download them if you have at least one geoip processor in your cluster, or when you add a new geoip processor.
Closes #90673