-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mechanism needed for benchmarking update-heavy workflows against create-only or index-only #422
Comments
Thanks for the suggestion @redneckbeard. I would:
As you write, for more complex cases there is always the possibility to provide the action and meta-data line in the original source file. |
@danielmitterdorfer that sounds great. Where things got messy for me was having to parse the JSON for the metadata here https://github.com/elastic/rally/blob/master/esrally/track/params.py#L817, and if you have an |
My initial thought was to implement this in |
With this commit we introduce a new property `conflict-probability` for the bulk-indexing parameter source. Previously we had a hard-codded probability of 25% but now the user can control it. We also use `update` now as bulk-indexing action when simulating a conflict (previously the action was always `index`). Closes #422
* Improve simulation of bulk-indexing conflicts With this commit we introduce a new property `conflict-probability` for the bulk-indexing parameter source. Previously we had a hard-codded probability of 25% but now the user can control it. We also add a new property `on-conflict` which allows users to define whether the action-and-metadata line should use "index" or "update" on simulated id conflicts (the default is "index"). Closes #422
Currently, rally can be configured to generate metadata that contains conflicting ids for 25% of the documents in a corpus. However, the action specified in the metadata is always
index
. Since the performance characteristics ofupdate
changed significantly in 5.0+, this seems like a blind spot in the current suite.While I understand that metadata can be interleaved with the documents in a corpus instead of being generated by rally, side-by-side comparison of performance of the same corpus with conflicting ids would be considerably simpler if the create/update workflow could use generated metadata.
The text was updated successfully, but these errors were encountered: