-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add support for transform API's #1011
Conversation
I realized that for the purpose of benchmarking a transform, it makes more sense to have a dedicated It seems still useful to have the single start and stop operations, so I do not see it as replacement but as addition. However, I am not sure about the telemetry device (probably it makes sense together with https://github.com/elastic/rally-eventdata-track). |
f3df0c7
to
c58e146
Compare
docs/track.rst
Outdated
With the operation ``execute-transform`` you can execute a full run of data transformation, it starts the transform and stops it once all data has been transformed. The transform must be created upfront. The ``execute-transform`` operation supports the following parameters: | ||
|
||
* ``transform-id`` (mandatory): The id of the transform to execute. | ||
* ``poll-interval`` (optional, defaults to `0.5`) How often transform stats are polled, used to set progress and check the state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to specify the time unit for poll-interval
? i.e. 0.5
(seconds)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, would the poll-interval also affect the performance? Not sure if we want to mention that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it might affect performance, but I can't say how much. It shouldn't be expensive.
FWIW: I debugged the progress reporting and it seems progress isn't working in rally. Is there an existing issue? Is there a plan to fix this? If not, it might be better to remove the progress report.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Progress reporting might be a bit of a trade of
Functions that implement completed
and percent_completed
are called until _completed
isn't set to True. The issue is that execute-transform
just has a while loop and blocks there until completed is set to True, so percent_completed
is never updated. For it to work we would need to have start_transform
outside of the execute_transform
and execute_transform
should just check get_transform_stats
and update _percent_completed
outside of the loop and after the sleep and and once it's done set _completed
to True. In this case it probably should be named something like wait-for-transformation
. And in the track we would need to have both operations start_transformation
followed by wait-for-transformation
. We used to have an example of that for snapshot recovery operation, but it got changed to just waiting for the restore to complete in a loop without updating percent_completion
to get more accurate throughout and how long it took to complete information as otherwise it could be off by the sleep time. Here is the change 9f73d5d where you can see both how it was before and after. Although i think in your case it should be okay to use percentage completed, since you are relaying on information from get_transform_stats to get the performance..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the explanation! Lets discuss this later, I lean towards keeping execute-transform
the way it is and rather remove the progress info.
What is the difference between what transform-stats telemetry device and execute-transform do? |
see my previous comment: #1011 (comment)
The The telemetry device is useful for testing continuous mode in transform. This is where something like the event generator comes into play. This is useful to test whether a cluster setup can handle a certain load of incoming data. So, despite my previous comment, I now think we should have both. |
@ebadyano and I discussed this PR:
|
closing this one in favor of #1022 |
add support for transform, consisting of the following parts:
Note: I will add definitions for the recorded metrics later after the 1st round of reviews.