Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mechanism needed for benchmarking update-heavy workflows against create-only or index-only #422

Closed
redneckbeard opened this issue Feb 27, 2018 · 3 comments
Labels
enhancement Improves the status quo :Track Management New operations, changes in the track format, track download changes and the like
Milestone

Comments

@redneckbeard
Copy link

Currently, rally can be configured to generate metadata that contains conflicting ids for 25% of the documents in a corpus. However, the action specified in the metadata is always index. Since the performance characteristics of update changed significantly in 5.0+, this seems like a blind spot in the current suite.

While I understand that metadata can be interleaved with the documents in a corpus instead of being generated by rally, side-by-side comparison of performance of the same corpus with conflicting ids would be considerably simpler if the create/update workflow could use generated metadata.

@danielmitterdorfer danielmitterdorfer added enhancement Improves the status quo :Track Management New operations, changes in the track format, track download changes and the like labels Feb 27, 2018
@danielmitterdorfer
Copy link
Member

Thanks for the suggestion @redneckbeard. I would:

  • make the amount of conflicts configurable (0 - 100%)
  • use update instead of index in the action and meta-data line for conflicting ids

As you write, for more complex cases there is always the possibility to provide the action and meta-data line in the original source file.

@redneckbeard
Copy link
Author

@danielmitterdorfer that sounds great. Where things got messy for me was having to parse the JSON for the metadata here https://github.com/elastic/rally/blob/master/esrally/track/params.py#L817, and if you have an update, wrap the document json in {"doc": %(document)s}. I figured maintainers would be able to see a cleaner way.

@danielmitterdorfer
Copy link
Member

My initial thought was to implement this in GenerateActionMetaData. However, I need to look whether this is really feasible (it probably is) but it will take a little while until I get to it as I'm busy at ElasticON this week.

@danielmitterdorfer danielmitterdorfer added this to the 1.x milestone Apr 12, 2018
danielmitterdorfer added a commit that referenced this issue Apr 18, 2018
With this commit we introduce a new property `conflict-probability` for
the bulk-indexing parameter source. Previously we had a hard-codded
probability of 25% but now the user can control it.

We also use `update` now as bulk-indexing action when simulating a
conflict (previously the action was always `index`).

Closes #422
@danielmitterdorfer danielmitterdorfer modified the milestones: 1.x, 0.10.2 Apr 20, 2018
@danielmitterdorfer danielmitterdorfer self-assigned this Apr 20, 2018
danielmitterdorfer added a commit that referenced this issue Apr 25, 2018
* Improve simulation of bulk-indexing conflicts

With this commit we introduce a new property `conflict-probability` for
the bulk-indexing parameter source. Previously we had a hard-codded
probability of 25% but now the user can control it.

We also add a new property `on-conflict` which allows users
to define whether the action-and-metadata line should use "index" or
"update" on simulated id conflicts (the default is "index").

Closes #422
@danielmitterdorfer danielmitterdorfer removed their assignment Apr 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improves the status quo :Track Management New operations, changes in the track format, track download changes and the like
Projects
None yet
Development

No branches or pull requests

2 participants