Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for transform API's #1011

Closed
wants to merge 17 commits into from

Conversation

hendrikmuhs
Copy link

add support for transform, consisting of the following parts:

  • create, start, stop, delete API support
  • transform-stats telemetry device to collect transform stats
  • basic command-line reporting
  • some basic documentation

Note: I will add definitions for the recorded metrics later after the 1st round of reviews.

@hendrikmuhs hendrikmuhs added the enhancement Improves the status quo label Jun 8, 2020
@danielmitterdorfer danielmitterdorfer added this to the 2.0.1 milestone Jun 8, 2020
@danielmitterdorfer danielmitterdorfer added :Load Driver Changes that affect the core of the load driver such as scheduling, the measurement approach etc. :Telemetry Telemetry Devices that gather additional metrics labels Jun 8, 2020
@hendrikmuhs
Copy link
Author

hendrikmuhs commented Jun 9, 2020

I realized that for the purpose of benchmarking a transform, it makes more sense to have a dedicated execute-transform operation that starts, stops the transform and collects the stats.

It seems still useful to have the single start and stop operations, so I do not see it as replacement but as addition. However, I am not sure about the telemetry device (probably it makes sense together with https://github.com/elastic/rally-eventdata-track).

@hendrikmuhs hendrikmuhs force-pushed the transform-support-2 branch from f3df0c7 to c58e146 Compare June 15, 2020 09:51
@ebadyano ebadyano self-assigned this Jun 15, 2020
docs/track.rst Outdated
With the operation ``execute-transform`` you can execute a full run of data transformation, it starts the transform and stops it once all data has been transformed. The transform must be created upfront. The ``execute-transform`` operation supports the following parameters:

* ``transform-id`` (mandatory): The id of the transform to execute.
* ``poll-interval`` (optional, defaults to `0.5`) How often transform stats are polled, used to set progress and check the state.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to specify the time unit for poll-interval? i.e. 0.5 (seconds)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, would the poll-interval also affect the performance? Not sure if we want to mention that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it might affect performance, but I can't say how much. It shouldn't be expensive.

FWIW: I debugged the progress reporting and it seems progress isn't working in rally. Is there an existing issue? Is there a plan to fix this? If not, it might be better to remove the progress report.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Progress reporting might be a bit of a trade of
Functions that implement completed and percent_completed are called until _completed isn't set to True. The issue is that execute-transform just has a while loop and blocks there until completed is set to True, so percent_completed is never updated. For it to work we would need to have start_transform outside of the execute_transform and execute_transform should just check get_transform_stats and update _percent_completed outside of the loop and after the sleep and and once it's done set _completed to True. In this case it probably should be named something like wait-for-transformation. And in the track we would need to have both operations start_transformation followed by wait-for-transformation. We used to have an example of that for snapshot recovery operation, but it got changed to just waiting for the restore to complete in a loop without updating percent_completion to get more accurate throughout and how long it took to complete information as otherwise it could be off by the sleep time. Here is the change 9f73d5d where you can see both how it was before and after. Although i think in your case it should be okay to use percentage completed, since you are relaying on information from get_transform_stats to get the performance..

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the explanation! Lets discuss this later, I lean towards keeping execute-transform the way it is and rather remove the progress info.

@ebadyano
Copy link
Contributor

What is the difference between what transform-stats telemetry device and execute-transform do?

@hendrikmuhs
Copy link
Author

What is the difference between what transform-stats telemetry device and execute-transform do?

see my previous comment: #1011 (comment)

execute-transform and the transform telemetry device measure the same metrics. After some thought, I still think we need both:

The execute-transform is operation is useful for measuring the performance of single transform executions, e.g. to test optimizations in either the transform core or in the underlying aggregation framework. These performance tests would be one-off executions.

The telemetry device is useful for testing continuous mode in transform. This is where something like the event generator comes into play. This is useful to test whether a cluster setup can handle a certain load of incoming data.

So, despite my previous comment, I now think we should have both.

@hendrikmuhs
Copy link
Author

@ebadyano and I discussed this PR:

  • execute-transform should be rewritten to wait-for-transform or merged into stop-transform, instead of blocking inside, it will use the completion framework and let rally run the outer polling loop (Rally will call stop/wait-for-transform until _completed=true)
    • in followup PR's the API can be extended to wait until a certain checkpoint has been reached. This will cover the continuous transform usecase (in the continuous case a transform should run several checkpoints and e.g. transform a certain amount of data/time)
  • having the above, the telemetry device becomes superfluous: I will remove it from the PR(I will keep the code in case we find a usecase later)

@hub-cap hub-cap changed the title add suport for transform API's add support for transform API's Jun 29, 2020
@hendrikmuhs
Copy link
Author

closing this one in favor of #1022

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improves the status quo :Load Driver Changes that affect the core of the load driver such as scheduling, the measurement approach etc. :Telemetry Telemetry Devices that gather additional metrics
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants