-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory prediction and task scaling #10
Memory prediction and task scaling #10
Conversation
src/main/resources/application.yml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is not required, but very useful during development. It allows different settings when running inside the development environment. Please advise if this shall be removed, or kept.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this great feature. I reviewed it in detail now and commented on things that we should discuss or that need adaptions. Please get back to me if something is unclear.
if (Boolean.TRUE.equals(o.success)) { | ||
// set model to peakRss + 10% | ||
if (model.containsKey(o.task)) { | ||
model.replace(o.task, o.peakRss.multiply(new BigDecimal("1.1")).setScale(0, RoundingMode.CEILING)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should not const always use the highest value ever seen? This would replace it with the most recent one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current behaviour is:
* - In case task was successful:
* - let the next prediction be 10% higher, then the peakRss was
*
* - In case task has failed:
* - reset to initial value
If the scheduler provides the tasks in order, with the tasks that has the biggest input first, this would cause the prediction to follow and always shrink.
But I agree that different "constant" strategies could be taken, e.g.:
- constant, biggest value seen
- constant, lowest value seen
- constant, latest value seen <- current approach
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ordering it by size desc is only one of many possible scheduling strategies. Ordering could also be asc, random or FIFO. We should use maximum with x% offset.
} else { | ||
log.debug("overprovisioning value will increase due to task failure"); | ||
Double old = overprovisioning.get(o.task); | ||
overprovisioning.put(o.task, old+0.05); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this will increase at the beginning, where we might make a few wrong predictions. However, the overprovisioning is never decreased if we have more observations and maybe better predictions.
We can also leave this for the future as this is a very cautious approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The overprovisioning was my first attempt in solving the problem of to low estimates.
Later I learned that errorstrategy is set to terminate by default and maxretries is very low (I belive this is 1 by default). So this will mostly do not occur, that this raises very much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about a strategy that checks the predictions for any task in the training data and determines the highest offset needed to fit all/95%/99% values? The percent value could be a user defined hyper parameter.
if ("default".equalsIgnoreCase(predictor)) { | ||
predictor = System.getenv("MEMORY_PREDICTOR_DEFAULT"); | ||
} | ||
if (predictor == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not match the readme having no predictor if nothing is set.
for (Pair<Double, Double> o : observationList) { | ||
double p = simpleRegression.predict(o.getLeft()); | ||
double op = overprovisioning.get(taskName); | ||
if ( (p*op) < o.getRight() ) { | ||
// The model predicted value would have been smaller then the | ||
// observed value. Our model is not (yet) appropriate. | ||
// Increase overprovisioning | ||
log.debug("overprovisioning value will increase due to model mismatch"); | ||
Double old = overprovisioning.get(taskName); | ||
overprovisioning.put(taskName, old+0.05); | ||
// Don't make a prediction this time | ||
return null; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't such a logic appropriate to also reduce the overprovisioning later?
if (prediction > initialValue.get(taskName).doubleValue()) { | ||
log.warn("prediction would exceed initial value"); | ||
return initialValue.get(taskName).toPlainString(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, compare to actual task's value
|
||
if (taskScaler!=null) { | ||
// change memory resource requests and limits here | ||
taskScaler.beforeTasksScheduled(unscheduledTasks); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This loop is called frequently.
Can we predict the task's values only when new observations came available. E.g. save the version of the predictor (simply the number of observations) into the task when a prediction is done. Then we only recalculate if the predictor has more observations.
Can we patch the task only if it is actually scheduled. I assume this is an expensive operation.
src/main/resources/application.yml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it!
fb7ebac
into
CommonWorkflowScheduler:memoryPrediction
This PR includes all changes required for memory prediction and task scaling.