Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory prediction and task scaling #10

Merged

Conversation

friederici
Copy link
Contributor

This PR includes all changes required for memory prediction and task scaling.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is not required, but very useful during development. It allows different settings when running inside the development environment. Please advise if this shall be removed, or kept.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it!

Copy link
Member

@Lehmann-Fabian Lehmann-Fabian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this great feature. I reviewed it in detail now and commented on things that we should discuss or that need adaptions. Please get back to me if something is unclear.

if (Boolean.TRUE.equals(o.success)) {
// set model to peakRss + 10%
if (model.containsKey(o.task)) {
model.replace(o.task, o.peakRss.multiply(new BigDecimal("1.1")).setScale(0, RoundingMode.CEILING));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not const always use the highest value ever seen? This would replace it with the most recent one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current behaviour is:

 * - In case task was successful:
 *   - let the next prediction be 10% higher, then the peakRss was 
 *
 * - In case task has failed:
 *   - reset to initial value

If the scheduler provides the tasks in order, with the tasks that has the biggest input first, this would cause the prediction to follow and always shrink.

But I agree that different "constant" strategies could be taken, e.g.:

  • constant, biggest value seen
  • constant, lowest value seen
  • constant, latest value seen <- current approach
    ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ordering it by size desc is only one of many possible scheduling strategies. Ordering could also be asc, random or FIFO. We should use maximum with x% offset.

} else {
log.debug("overprovisioning value will increase due to task failure");
Double old = overprovisioning.get(o.task);
overprovisioning.put(o.task, old+0.05);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this will increase at the beginning, where we might make a few wrong predictions. However, the overprovisioning is never decreased if we have more observations and maybe better predictions.
We can also leave this for the future as this is a very cautious approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overprovisioning was my first attempt in solving the problem of to low estimates.
Later I learned that errorstrategy is set to terminate by default and maxretries is very low (I belive this is 1 by default). So this will mostly do not occur, that this raises very much.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about a strategy that checks the predictions for any task in the training data and determines the highest offset needed to fit all/95%/99% values? The percent value could be a user defined hyper parameter.

if ("default".equalsIgnoreCase(predictor)) {
predictor = System.getenv("MEMORY_PREDICTOR_DEFAULT");
}
if (predictor == null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not match the readme having no predictor if nothing is set.

Comment on lines +157 to +170
for (Pair<Double, Double> o : observationList) {
double p = simpleRegression.predict(o.getLeft());
double op = overprovisioning.get(taskName);
if ( (p*op) < o.getRight() ) {
// The model predicted value would have been smaller then the
// observed value. Our model is not (yet) appropriate.
// Increase overprovisioning
log.debug("overprovisioning value will increase due to model mismatch");
Double old = overprovisioning.get(taskName);
overprovisioning.put(taskName, old+0.05);
// Don't make a prediction this time
return null;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't such a logic appropriate to also reduce the overprovisioning later?

Comment on lines 184 to 187
if (prediction > initialValue.get(taskName).doubleValue()) {
log.warn("prediction would exceed initial value");
return initialValue.get(taskName).toPlainString();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, compare to actual task's value


if (taskScaler!=null) {
// change memory resource requests and limits here
taskScaler.beforeTasksScheduled(unscheduledTasks);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loop is called frequently.
Can we predict the task's values only when new observations came available. E.g. save the version of the predictor (simply the number of observations) into the task when a prediction is done. Then we only recalculate if the predictor has more observations.
Can we patch the task only if it is actually scheduled. I assume this is an expensive operation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it!

@Lehmann-Fabian Lehmann-Fabian changed the base branch from master to memoryPrediction March 4, 2024 15:35
@Lehmann-Fabian Lehmann-Fabian merged commit fb7ebac into CommonWorkflowScheduler:memoryPrediction Mar 4, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants