Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add task exclude filter #844

Merged
merged 4 commits into from
Dec 19, 2019
Merged

Add task exclude filter #844

merged 4 commits into from
Dec 19, 2019

Conversation

ebadyano
Copy link
Contributor

Closes #496

@ebadyano ebadyano added enhancement Improves the status quo :Usability Makes Rally easier to use :Load Driver Changes that affect the core of the load driver such as scheduling, the measurement approach etc. labels Dec 16, 2019
@ebadyano ebadyano added this to the 1.4.0 milestone Dec 16, 2019

filtered = loader.filter_tasks(full_track, [track.TaskNameFilter("index-3")], exclude=True)

schedule = filtered.challenges[0].schedule
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we exclude one of the parallel tasks then nothing in that group runs.. should I correct the behaviour for that or does it make sense like that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we exclude a single task in a parallel element, only that single one should be excluded but all others should still be included.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had another look and I meant something else than what is implemented at the moment. I have visualized the challenge index-and-query-logs-fixed-daily-volume from the eventdata track here which contains such a parallel element (I've simplified it to only simulate a single day):

1. measure-maximum-utilization (8 clients)
2. delete-measurement-index
3. 5 parallel tasks (12 clients):
	3.1 bulk-index-logs-100%-utilization (8 clients)
	3.2 current-kibana-traffic-country-dashboard_60m-querying-100%-utilization
	3.3 current-kibana-discover_30m-querying-100%-utilization
	3.4 current-kibana-traffic-dashboard_30m-querying-100%-utilization
	3.5 current-kibana-content_issues-dashboard_30m-querying-100%-utilization

If we specify --exclude-tasks="type:search", I expect the following schedule:

1. measure-maximum-utilization (8 clients)
2. delete-measurement-index
3. 1 parallel task (8 clients):
	3.1 bulk-index-logs-100%-utilization (8 clients)

but instead we get:

1. measure-maximum-utilization (8 clients)
2. delete-measurement-index
3. 5 parallel tasks (12 clients):
	3.1 bulk-index-logs-100%-utilization (8 clients)
	3.2 current-kibana-traffic-country-dashboard_60m-querying-100%-utilization
	3.3 current-kibana-discover_30m-querying-100%-utilization
	3.4 current-kibana-traffic-dashboard_30m-querying-100%-utilization
	3.5 current-kibana-content_issues-dashboard_30m-querying-100%-utilization

Can you please have another look?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi, I've created this output with a new info subcommand that I've added to Rally. It already includes support for your change but as it is not merged yet I've only pushed it as draft PR #850.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scratch that. As discussed offline, the operation type is kibana not search. So the filter is actually working fine. Sorry for the confusion. :)

Copy link
Member

@danielmitterdorfer danielmitterdorfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did an initial pass and it looks quite good already. Can you please also add the new flag to the command line reference documentation?

@@ -1257,21 +1257,21 @@ def test_sets_absolute_path(self, path_exists):

class TrackFilterTests(TestCase):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please change the test case method names? They sometimes reference the term "included" which is outdated now.


filtered = loader.filter_tasks(full_track, [track.TaskNameFilter("index-3")], exclude=True)

schedule = filtered.challenges[0].schedule
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we exclude a single task in a parallel element, only that single one should be excluded but all others should still be included.

Copy link
Member

@danielmitterdorfer danielmitterdorfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating. I left a few more comments.


filtered = loader.filter_tasks(full_track, [track.TaskNameFilter("index-3")], exclude=True)

schedule = filtered.challenges[0].schedule
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had another look and I meant something else than what is implemented at the moment. I have visualized the challenge index-and-query-logs-fixed-daily-volume from the eventdata track here which contains such a parallel element (I've simplified it to only simulate a single day):

1. measure-maximum-utilization (8 clients)
2. delete-measurement-index
3. 5 parallel tasks (12 clients):
	3.1 bulk-index-logs-100%-utilization (8 clients)
	3.2 current-kibana-traffic-country-dashboard_60m-querying-100%-utilization
	3.3 current-kibana-discover_30m-querying-100%-utilization
	3.4 current-kibana-traffic-dashboard_30m-querying-100%-utilization
	3.5 current-kibana-content_issues-dashboard_30m-querying-100%-utilization

If we specify --exclude-tasks="type:search", I expect the following schedule:

1. measure-maximum-utilization (8 clients)
2. delete-measurement-index
3. 1 parallel task (8 clients):
	3.1 bulk-index-logs-100%-utilization (8 clients)

but instead we get:

1. measure-maximum-utilization (8 clients)
2. delete-measurement-index
3. 5 parallel tasks (12 clients):
	3.1 bulk-index-logs-100%-utilization (8 clients)
	3.2 current-kibana-traffic-country-dashboard_60m-querying-100%-utilization
	3.3 current-kibana-discover_30m-querying-100%-utilization
	3.4 current-kibana-traffic-dashboard_30m-querying-100%-utilization
	3.5 current-kibana-content_issues-dashboard_30m-querying-100%-utilization

Can you please have another look?

``exclude-tasks``
~~~~~~~~~~~~~~~~~

Similarly to :ref:`include-tasks <clr_include_tasks>` when challenge consists of one or more tasks you might be interested in excluding a single operations but include the rest.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: when a challenge.

@@ -305,7 +305,7 @@ To specify different workloads in the same track you can use so-called challenge

When should you use challenges? Challenges are useful when you want to run completely different workloads based on the same track but for the majority of cases you should get away without using challenges:

* To run only a subset of the tasks, you can use :ref:`task filtering <clr_include_tasks>`, e.g. ``--include-tasks="create-index,bulk"`` will only run these two tasks in the track above.
* To run only a subset of the tasks, you can use :ref:`task filtering <clr_include_tasks>`, e.g. ``--include-tasks="create-index,bulk"`` will only run these two tasks in the track above or ``--exclude-tasks="bulk"`` will run all tasks except for `bulk`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: missing period.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to use double-backticks so it is

``bulk``

instead of

`bulk`


**Examples**:

* Do not execute any tasks with the name ``index`` and ``term``: ``--exclude-tasks="index,term"``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Use "Skip" instead of "Do not execute"?

Copy link
Member

@danielmitterdorfer danielmitterdorfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM

@ebadyano ebadyano merged commit 15948fb into elastic:master Dec 19, 2019
ebadyano added a commit to ebadyano/rally that referenced this pull request Jan 10, 2020
ebadyano added a commit that referenced this pull request Jan 10, 2020
@ebadyano ebadyano deleted the master-filter branch December 16, 2022 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improves the status quo :Load Driver Changes that affect the core of the load driver such as scheduling, the measurement approach etc. :Usability Makes Rally easier to use
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add task exclude filter
2 participants