-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Load only required corpora #555
Load only required corpora #555
Conversation
With this commit Rally will only load the required corpora / document sets that are actually referenced by the tasks in the currently selected challenge. This saves race setup time by avoiding unneeded downloads and decompression of document sets. Closes elastic#553
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Elegant solution thanks! LGTM, left a minor comment.
esrally/track/track.py
Outdated
@@ -214,6 +214,13 @@ def filter(self, source_format=None, target_indices=None): | |||
filtered.append(d) | |||
return DocumentCorpus(self.name, filtered) | |||
|
|||
def union(self, other): | |||
assert self.name == other.name, "Both document corpora must have the same name" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've discussed this for other cases before, but would it be more consistent with existing behavior to throw an exception here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can address this, sure.
With this commit we load track plugins eagerly when preparing the track. We need to do this because the track loader needs to query all parameter sources for the corpora that are in use. Relates #555
With this commit Rally will only load the required corpora / document
sets that are actually referenced by the tasks in the currently selected
challenge. This saves race setup time by avoiding unneeded downloads and
decompression of document sets.
Closes #553