Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add task exclude filter #844

Merged
merged 4 commits into from
Dec 19, 2019
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/adding_tracks.rst
Original file line number Diff line number Diff line change
Expand Up @@ -305,7 +305,7 @@ To specify different workloads in the same track you can use so-called challenge

When should you use challenges? Challenges are useful when you want to run completely different workloads based on the same track but for the majority of cases you should get away without using challenges:

* To run only a subset of the tasks, you can use :ref:`task filtering <clr_include_tasks>`, e.g. ``--include-tasks="create-index,bulk"`` will only run these two tasks in the track above.
* To run only a subset of the tasks, you can use :ref:`task filtering <clr_include_tasks>`, e.g. ``--include-tasks="create-index,bulk"`` will only run these two tasks in the track above or ``--exclude-tasks="bulk"`` will run all tasks except for `bulk`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: missing period.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to use double-backticks so it is

``bulk``

instead of

`bulk`

* To vary parameters, e.g. the number of clients, you can use :ref:`track parameters <clr_track_params>`

Structuring your track
Expand Down
13 changes: 13 additions & 0 deletions docs/command_line_reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -287,6 +287,19 @@ You can use ``--include-tasks`` to specify a comma-separated list of tasks that
* Execute only tasks of type ``search``: ``--include-tasks="type:search"``
* You can also mix and match: ``--include-tasks="index,type:search"``

``exclude-tasks``
~~~~~~~~~~~~~~~~~

Similarly to :ref:`include-tasks <clr_include_tasks>` when challenge consists of one or more tasks you might be interested in excluding a single operations but include the rest.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: when a challenge.


You can use ``--exclude-tasks`` to specify a comma-separated list of tasks that you want to skip. Each item in the list defines either the name of a task or the operation type of a task. Only the tasks that did not match will be executed.

**Examples**:

* Do not execute any tasks with the name ``index`` and ``term``: ``--exclude-tasks="index,term"``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Use "Skip" instead of "Do not execute"?

* Do not execute any tasks of type ``search``: ``--exclude-tasks="type:search"``
* You can also mix and match: ``--exclude-tasks="index,type:search"``

``team-repository``
~~~~~~~~~~~~~~~~~~~

Expand Down
4 changes: 3 additions & 1 deletion esrally/track/loader.py
Original file line number Diff line number Diff line change
Expand Up @@ -632,6 +632,8 @@ def filter_tasks(t, filters, exclude=False):
def filter_out_match(task, filters, exclude):
for f in filters:
if task.matches(f):
if hasattr(task, 'tasks') and exclude:
return False
return exclude
return not exclude

Expand Down Expand Up @@ -673,7 +675,7 @@ def filters_from_filtered_tasks(filtered_tasks):
filters.append(track.TaskOpTypeFilter(spec[1]))
else:
raise exceptions.SystemSetupError(
"Invalid format for included tasks: [%s]. Expected [type] but got [%s]." % (t, spec[0]))
"Invalid format for filtered tasks: [%s]. Expected [type] but got [%s]." % (t, spec[0]))
else:
raise exceptions.SystemSetupError("Invalid format for filtered tasks: [%s]" % t)
return filters
Expand Down
12 changes: 6 additions & 6 deletions tests/track/loader_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -1256,11 +1256,11 @@ def test_sets_absolute_path(self, path_exists):


class TrackFilterTests(TestCase):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please change the test case method names? They sometimes reference the term "included" which is outdated now.

def test_create_filters_from_empty_included_tasks(self):
def test_create_filters_from_empty_filtered_tasks(self):
self.assertEqual(0, len(loader.filters_from_filtered_tasks(None)))
self.assertEqual(0, len(loader.filters_from_filtered_tasks([])))

def test_create_filters_from_mixed_included_tasks(self):
def test_create_filters_from_mixed_filtered_tasks(self):
filters = loader.filters_from_filtered_tasks(["force-merge", "type:search"])
self.assertListEqual([track.TaskNameFilter("force-merge"), track.TaskOpTypeFilter("search")], filters)

Expand All @@ -1272,7 +1272,7 @@ def test_rejects_invalid_syntax(self):
def test_rejects_unknown_filter_type(self):
with self.assertRaises(exceptions.SystemSetupError) as ctx:
loader.filters_from_filtered_tasks(["valid", "op-type:index"])
self.assertEqual("Invalid format for included tasks: [op-type:index]. Expected [type] but got [op-type].", ctx.exception.args[0])
self.assertEqual("Invalid format for filtered tasks: [op-type:index]. Expected [type] but got [op-type].", ctx.exception.args[0])

def test_filters_tasks(self):
track_specification = {
Expand Down Expand Up @@ -1428,12 +1428,12 @@ def test_filters_exclude_tasks(self):
full_track = reader("unittest", track_specification, "/mappings")
self.assertEqual(4, len(full_track.challenges[0].schedule))

filtered = loader.filter_tasks(full_track, [track.TaskNameFilter("index-3")], exclude=True)
filtered = loader.filter_tasks(full_track, [track.TaskNameFilter("index-3"), track.TaskOpTypeFilter("search")], exclude=True)

schedule = filtered.challenges[0].schedule
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we exclude one of the parallel tasks then nothing in that group runs.. should I correct the behaviour for that or does it make sense like that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we exclude a single task in a parallel element, only that single one should be excluded but all others should still be included.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had another look and I meant something else than what is implemented at the moment. I have visualized the challenge index-and-query-logs-fixed-daily-volume from the eventdata track here which contains such a parallel element (I've simplified it to only simulate a single day):

1. measure-maximum-utilization (8 clients)
2. delete-measurement-index
3. 5 parallel tasks (12 clients):
	3.1 bulk-index-logs-100%-utilization (8 clients)
	3.2 current-kibana-traffic-country-dashboard_60m-querying-100%-utilization
	3.3 current-kibana-discover_30m-querying-100%-utilization
	3.4 current-kibana-traffic-dashboard_30m-querying-100%-utilization
	3.5 current-kibana-content_issues-dashboard_30m-querying-100%-utilization

If we specify --exclude-tasks="type:search", I expect the following schedule:

1. measure-maximum-utilization (8 clients)
2. delete-measurement-index
3. 1 parallel task (8 clients):
	3.1 bulk-index-logs-100%-utilization (8 clients)

but instead we get:

1. measure-maximum-utilization (8 clients)
2. delete-measurement-index
3. 5 parallel tasks (12 clients):
	3.1 bulk-index-logs-100%-utilization (8 clients)
	3.2 current-kibana-traffic-country-dashboard_60m-querying-100%-utilization
	3.3 current-kibana-discover_30m-querying-100%-utilization
	3.4 current-kibana-traffic-dashboard_30m-querying-100%-utilization
	3.5 current-kibana-content_issues-dashboard_30m-querying-100%-utilization

Can you please have another look?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi, I've created this output with a new info subcommand that I've added to Rally. It already includes support for your change but as it is not merged yet I've only pushed it as draft PR #850.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scratch that. As discussed offline, the operation type is kibana not search. So the filter is actually working fine. Sorry for the confusion. :)

self.assertEqual(3, len(schedule))
self.assertEqual("node-stats", schedule[0].name)
self.assertEqual("match-all-serial", schedule[1].name)
self.assertEqual(["index-1",'index-2'], [t.name for t in schedule[0].tasks])
self.assertEqual("node-stats", schedule[1].name)
self.assertEqual("cluster-stats", schedule[2].name)

class TrackSpecificationReaderTests(TestCase):
Expand Down