add support for transform API's #1011

hendrikmuhs · 2020-06-08T10:23:39Z

add support for transform, consisting of the following parts:

create, start, stop, delete API support
transform-stats telemetry device to collect transform stats
basic command-line reporting
some basic documentation

Note: I will add definitions for the recorded metrics later after the 1st round of reviews.

esrally/driver/runner.py

hendrikmuhs · 2020-06-09T15:44:34Z

I realized that for the purpose of benchmarking a transform, it makes more sense to have a dedicated execute-transform operation that starts, stops the transform and collects the stats.

It seems still useful to have the single start and stop operations, so I do not see it as replacement but as addition. However, I am not sure about the telemetry device (probably it makes sense together with https://github.com/elastic/rally-eventdata-track).

docs/telemetry.rst

esrally/driver/runner.py

tests/driver/runner_test.py

docs/telemetry.rst

ebadyano · 2020-06-18T20:02:14Z

docs/track.rst

+With the operation ``execute-transform`` you can execute a full run of data transformation, it starts the transform and stops it once all data has been transformed. The transform must be created upfront. The ``execute-transform`` operation supports the following parameters:
+
+* ``transform-id`` (mandatory): The id of the transform to execute.
+* ``poll-interval`` (optional, defaults to `0.5`) How often transform stats are polled, used to set progress and check the state.


Does it make sense to specify the time unit for poll-interval? i.e. 0.5 (seconds)?

Also, would the poll-interval also affect the performance? Not sure if we want to mention that.

Yes, it might affect performance, but I can't say how much. It shouldn't be expensive.

FWIW: I debugged the progress reporting and it seems progress isn't working in rally. Is there an existing issue? Is there a plan to fix this? If not, it might be better to remove the progress report.

Progress reporting might be a bit of a trade of
Functions that implement completed and percent_completed are called until _completed isn't set to True. The issue is that execute-transform just has a while loop and blocks there until completed is set to True, so percent_completed is never updated. For it to work we would need to have start_transform outside of the execute_transform and execute_transform should just check get_transform_stats and update _percent_completed outside of the loop and after the sleep and and once it's done set _completed to True. In this case it probably should be named something like wait-for-transformation. And in the track we would need to have both operations start_transformation followed by wait-for-transformation. We used to have an example of that for snapshot recovery operation, but it got changed to just waiting for the restore to complete in a loop without updating percent_completion to get more accurate throughout and how long it took to complete information as otherwise it could be off by the sleep time. Here is the change 9f73d5d where you can see both how it was before and after. Although i think in your case it should be okay to use percentage completed, since you are relaying on information from get_transform_stats to get the performance..

thanks for the explanation! Lets discuss this later, I lean towards keeping execute-transform the way it is and rather remove the progress info.

ebadyano · 2020-06-18T20:22:43Z

What is the difference between what transform-stats telemetry device and execute-transform do?

hendrikmuhs · 2020-06-23T14:38:55Z

What is the difference between what transform-stats telemetry device and execute-transform do?

see my previous comment: #1011 (comment)

execute-transform and the transform telemetry device measure the same metrics. After some thought, I still think we need both:

The execute-transform is operation is useful for measuring the performance of single transform executions, e.g. to test optimizations in either the transform core or in the underlying aggregation framework. These performance tests would be one-off executions.

The telemetry device is useful for testing continuous mode in transform. This is where something like the event generator comes into play. This is useful to test whether a cluster setup can handle a certain load of incoming data.

So, despite my previous comment, I now think we should have both.

esrally/telemetry.py

tests/telemetry_test.py

esrally/driver/runner.py

esrally/telemetry.py

esrally/driver/runner.py

hendrikmuhs · 2020-06-26T13:56:10Z

@ebadyano and I discussed this PR:

execute-transform should be rewritten to wait-for-transform or merged into stop-transform, instead of blocking inside, it will use the completion framework and let rally run the outer polling loop (Rally will call stop/wait-for-transform until _completed=true)
- in followup PR's the API can be extended to wait until a certain checkpoint has been reached. This will cover the continuous transform usecase (in the continuous case a transform should run several checkpoints and e.g. transform a certain amount of data/time)
having the above, the telemetry device becomes superfluous: I will remove it from the PR(I will keep the code in case we find a usecase later)

hendrikmuhs · 2020-06-30T08:17:11Z

closing this one in favor of #1022

hendrikmuhs added the enhancement Improves the status quo label Jun 8, 2020

danielmitterdorfer added this to the 2.0.1 milestone Jun 8, 2020

danielmitterdorfer requested a review from ebadyano June 8, 2020 11:36

danielmitterdorfer added :Load Driver Changes that affect the core of the load driver such as scheduling, the measurement approach etc. :Telemetry Telemetry Devices that gather additional metrics labels Jun 8, 2020

hendrikmuhs commented Jun 9, 2020

View reviewed changes

esrally/driver/runner.py Show resolved Hide resolved

hendrikmuhs added 5 commits June 15, 2020 11:45

add suport for transform

6faa39d

add an execute-transform runner

4ab0d66

improve execute transform

eeaff59

apply lint suggestion

d729e5d

adapt to upstream changes

c58e146

hendrikmuhs force-pushed the transform-support-2 branch from f3df0c7 to c58e146 Compare June 15, 2020 09:51

ebadyano self-assigned this Jun 15, 2020

ebadyano reviewed Jun 16, 2020

View reviewed changes

docs/telemetry.rst Show resolved Hide resolved

ebadyano reviewed Jun 16, 2020

View reviewed changes

esrally/driver/runner.py Show resolved Hide resolved

ebadyano reviewed Jun 17, 2020

View reviewed changes

esrally/driver/runner.py Show resolved Hide resolved

hendrikmuhs added 3 commits June 17, 2020 14:42

improve documentation

9c08cd1

add tests for admin operations

7e73926

add a test for execute-transform

834c898

ebadyano reviewed Jun 17, 2020

View reviewed changes

tests/driver/runner_test.py Show resolved Hide resolved

fix progess

7d2e4ac

hendrikmuhs mentioned this pull request Jun 17, 2020

add a transform challenge elastic/rally-tracks#128

Merged

fix code style

47571ed

ebadyano reviewed Jun 18, 2020

View reviewed changes

docs/telemetry.rst Outdated Show resolved Hide resolved

ebadyano reviewed Jun 18, 2020

View reviewed changes

hendrikmuhs added 2 commits June 24, 2020 17:00

add a warning regarding polling too often

3729535

remove segment-stats from console output

0773811