Mechanism needed for benchmarking update-heavy workflows against create-only or index-only #422

redneckbeard · 2018-02-27T15:11:17Z

Currently, rally can be configured to generate metadata that contains conflicting ids for 25% of the documents in a corpus. However, the action specified in the metadata is always index. Since the performance characteristics of update changed significantly in 5.0+, this seems like a blind spot in the current suite.

While I understand that metadata can be interleaved with the documents in a corpus instead of being generated by rally, side-by-side comparison of performance of the same corpus with conflicting ids would be considerably simpler if the create/update workflow could use generated metadata.

The text was updated successfully, but these errors were encountered:

danielmitterdorfer · 2018-02-27T16:39:51Z

Thanks for the suggestion @redneckbeard. I would:

make the amount of conflicts configurable (0 - 100%)
use update instead of index in the action and meta-data line for conflicting ids

As you write, for more complex cases there is always the possibility to provide the action and meta-data line in the original source file.

redneckbeard · 2018-02-27T16:45:03Z

@danielmitterdorfer that sounds great. Where things got messy for me was having to parse the JSON for the metadata here https://github.com/elastic/rally/blob/master/esrally/track/params.py#L817, and if you have an update, wrap the document json in {"doc": %(document)s}. I figured maintainers would be able to see a cleaner way.

danielmitterdorfer · 2018-02-27T17:02:02Z

My initial thought was to implement this in GenerateActionMetaData. However, I need to look whether this is really feasible (it probably is) but it will take a little while until I get to it as I'm busy at ElasticON this week.

With this commit we introduce a new property `conflict-probability` for the bulk-indexing parameter source. Previously we had a hard-codded probability of 25% but now the user can control it. We also use `update` now as bulk-indexing action when simulating a conflict (previously the action was always `index`). Closes #422

* Improve simulation of bulk-indexing conflicts With this commit we introduce a new property `conflict-probability` for the bulk-indexing parameter source. Previously we had a hard-codded probability of 25% but now the user can control it. We also add a new property `on-conflict` which allows users to define whether the action-and-metadata line should use "index" or "update" on simulated id conflicts (the default is "index"). Closes #422

danielmitterdorfer added enhancement Improves the status quo :Track Management New operations, changes in the track format, track download changes and the like labels Feb 27, 2018

danielmitterdorfer added this to the 1.x milestone Apr 12, 2018

danielmitterdorfer modified the milestones: 1.x, 0.10.2 Apr 20, 2018

danielmitterdorfer self-assigned this Apr 20, 2018

danielmitterdorfer mentioned this issue Apr 20, 2018

Improve simulation of bulk-indexing conflicts #477

Merged

danielmitterdorfer closed this as completed in #477 Apr 25, 2018

danielmitterdorfer removed their assignment Apr 25, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mechanism needed for benchmarking update-heavy workflows against create-only or index-only #422

Mechanism needed for benchmarking update-heavy workflows against create-only or index-only #422

redneckbeard commented Feb 27, 2018

danielmitterdorfer commented Feb 27, 2018

redneckbeard commented Feb 27, 2018

danielmitterdorfer commented Feb 27, 2018

Mechanism needed for benchmarking update-heavy workflows against create-only or index-only #422

Mechanism needed for benchmarking update-heavy workflows against create-only or index-only #422

Comments

redneckbeard commented Feb 27, 2018

danielmitterdorfer commented Feb 27, 2018

redneckbeard commented Feb 27, 2018

danielmitterdorfer commented Feb 27, 2018