Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load only required corpora #555

Merged

Conversation

danielmitterdorfer
Copy link
Member

With this commit Rally will only load the required corpora / document
sets that are actually referenced by the tasks in the currently selected
challenge. This saves race setup time by avoiding unneeded downloads and
decompression of document sets.

Closes #553

With this commit Rally will only load the required corpora / document
sets that are actually referenced by the tasks in the currently selected
challenge. This saves race setup time by avoiding unneeded downloads and
decompression of document sets.

Closes elastic#553
@danielmitterdorfer danielmitterdorfer added enhancement Improves the status quo :Track Management New operations, changes in the track format, track download changes and the like labels Aug 24, 2018
@danielmitterdorfer danielmitterdorfer added this to the 1.0.1 milestone Aug 24, 2018
Copy link
Contributor

@dliappis dliappis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elegant solution thanks! LGTM, left a minor comment.

@@ -214,6 +214,13 @@ def filter(self, source_format=None, target_indices=None):
filtered.append(d)
return DocumentCorpus(self.name, filtered)

def union(self, other):
assert self.name == other.name, "Both document corpora must have the same name"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've discussed this for other cases before, but would it be more consistent with existing behavior to throw an exception here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can address this, sure.

@danielmitterdorfer danielmitterdorfer merged commit a9059ce into elastic:master Aug 27, 2018
danielmitterdorfer added a commit that referenced this pull request Aug 27, 2018
With this commit we load track plugins eagerly when preparing the track.
We need to do this because the track loader needs to query all parameter
sources for the corpora that are in use.

Relates #555
danielmitterdorfer added a commit that referenced this pull request Sep 4, 2018
@danielmitterdorfer danielmitterdorfer deleted the load-required-corpora branch March 5, 2019 07:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improves the status quo :Track Management New operations, changes in the track format, track download changes and the like
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants