Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory prediction and task scaling #10

Merged
Merged
Show file tree
Hide file tree
Changes from 85 commits
Commits
Show all changes
101 commits
Select commit Hold shift + click to select a range
60a58ee
collect task execution results and store them in the memory optimizer
friederici Sep 27, 2023
dda273a
restructured TaskScaler
friederici Oct 18, 2023
a846aa2
cleanup debug logs
friederici Oct 18, 2023
a87afc1
cleanup Scheduler
friederici Oct 19, 2023
ffcddd4
cleanup Task
friederici Oct 19, 2023
4b7640d
cleanup Task
friederici Oct 19, 2023
8eadf40
add hook after workflow is completed
friederici Oct 19, 2023
537df5d
change MemoryOptimizer to be an interface, add two different Optimizers
friederici Oct 19, 2023
82aaf7f
round suggestions to ceiling
friederici Oct 20, 2023
0826b91
introduced LinearPredictor
friederici Oct 20, 2023
440add6
changed indentation to 4 spaces, like rest of the project uses
friederici Oct 20, 2023
cd5a66a
fix mis-formatting
friederici Oct 20, 2023
43f07e7
fix typos
friederici Oct 20, 2023
445a84b
initial implementation linearPredictor
friederici Oct 20, 2023
02fee25
builder for observations
friederici Oct 24, 2023
8755f00
added test for constant predictor
friederici Oct 24, 2023
fde2665
remove wasted calculation from observation
friederici Oct 24, 2023
daecba1
remove wasted calculation from observation
friederici Oct 24, 2023
01185df
add NonePredictorTest
friederici Oct 24, 2023
827e2b4
sanity checks for observations
friederici Oct 24, 2023
cbcee82
add negative case for ConstantPredictor
friederici Oct 24, 2023
66ab197
assert rise and fall of suggestions
friederici Oct 24, 2023
9f8b576
added LinearPredictorTest
friederici Oct 24, 2023
61bd311
avoid negative preditions
friederici Oct 24, 2023
14ef86c
use SimpleRegression for LinearPredictor
friederici Oct 24, 2023
4098e74
fix naming to always be prediction, instead of suggestion
friederici Oct 24, 2023
a081142
fix naming
friederici Oct 24, 2023
dceab80
fix some minor issues
friederici Oct 25, 2023
f705f61
collect statistics
friederici Oct 25, 2023
c126281
removed Limits, rely solely on Requests instead
friederici Oct 26, 2023
2e8d14f
added new CombiPredictor
friederici Oct 26, 2023
d9e578b
remove solved fixme
friederici Oct 26, 2023
b738ac4
csv export
friederici Oct 26, 2023
9b61cfc
Merge pull request #1 from CommonWorkflowScheduler/master
friederici Oct 26, 2023
10ceba7
save statistics summary and csv into file in workflow baseDir
friederici Oct 26, 2023
47222c1
added Tasks realtime to statistic, moved NfTrace reader to own utilit…
friederici Oct 27, 2023
b503166
add test if trace file is missing and handling for that
friederici Oct 27, 2023
c3cc4f2
fix no resize when request was 0
friederici Oct 29, 2023
f61f99c
apply config only in dev profile
friederici Oct 29, 2023
a732d2f
statistics log execution and predictor
friederici Oct 29, 2023
6bc2bc0
log makespan
friederici Oct 29, 2023
a38cadc
add peak_vmem for sanity checks
friederici Oct 29, 2023
2e860d5
add unit tests for statistics
friederici Oct 30, 2023
61e9f96
added todos for missing testcases
friederici Oct 30, 2023
1ec1417
fix for-loop should continue, not break
friederici Nov 1, 2023
db4e822
only invoke TaskScaler when config was given
friederici Nov 1, 2023
2ede083
get memory predictor from config, not from environment
friederici Nov 1, 2023
2bd9a00
removed double code
friederici Nov 1, 2023
160dd71
prepare application.yml for merge
friederici Nov 3, 2023
1da54a8
prepare application.yml for merge
friederici Nov 3, 2023
5b21105
fixed decimal seperator
friederici Nov 3, 2023
0ceb17a
Merge pull request #4 from CommonWorkflowScheduler/master
friederici Nov 3, 2023
ed71e1d
fix decimal seperator
friederici Nov 3, 2023
a352a4b
changed logging in dev profile
friederici Nov 3, 2023
838faee
improved predictor selection order
friederici Nov 3, 2023
5f6d2a1
added template for square predictor
friederici Nov 4, 2023
1864d9d
collect wasted in summary
friederici Nov 4, 2023
9373302
add wasted to statistics
friederici Nov 4, 2023
b07c526
avoid updating tasks when no new model is available
friederici Nov 4, 2023
6e5a35a
added new testcases
friederici Nov 4, 2023
c8caf00
change return value for missing file to -1
friederici Nov 6, 2023
c6ab571
changed sanity check
friederici Nov 6, 2023
3896718
fix constant predictor
friederici Nov 6, 2023
af35f16
faster overprovisioning
friederici Nov 6, 2023
9a99b48
add wary predictor
friederici Nov 7, 2023
ae72d0c
fix imports
friederici Nov 7, 2023
bd1a1bc
wary predictor
friederici Nov 7, 2023
66730f9
filter realtime 0
friederici Nov 8, 2023
d7f093e
use vmem instead of rss
friederici Nov 8, 2023
54e907e
correct tests
friederici Nov 8, 2023
c3c2ebb
require 4 successful observations
friederici Nov 8, 2023
496b82a
ignore list feature
friederici Nov 8, 2023
8478ee4
never provide predictions lower than the lowest successful value was
friederici Nov 8, 2023
cd70d2d
prevent cws from get stuck
friederici Nov 11, 2023
6f01832
blacklist failed tasks
friederici Nov 16, 2023
c878d7f
removed flawed wasted from csv, added assigned node
friederici Nov 18, 2023
a74c20e
lower limit for request size 256MiB
friederici Nov 19, 2023
e32c74a
fix bad naming
friederici Nov 23, 2023
98b64c2
junit test for TaskScaler
friederici Dec 6, 2023
34117eb
add remark for TaskScalerTest
friederici Dec 6, 2023
a96a151
fix used predictor
friederici Dec 6, 2023
8f4ca6f
removed unimplemended square predictor
friederici Feb 20, 2024
0fb4ea2
removed unused generation feature from constant predictor
friederici Feb 20, 2024
d513125
removed unused generation feature
friederici Feb 20, 2024
1791a1a
cleanup classname
friederici Feb 20, 2024
aa6f7ea
removed testcase that is no longer in line with desired behaviour
friederici Feb 20, 2024
18fa249
fixed comments
friederici Feb 20, 2024
f698ce6
add description to README
friederici Feb 20, 2024
e34a4e5
add description to README
friederici Feb 20, 2024
5c67ec1
catch exception that is thrown when InPlacePodVerticalScaling is not …
friederici Feb 27, 2024
e6e3f63
add note on profiles in README
friederici Feb 27, 2024
9a1064f
always write log to file
friederici Feb 27, 2024
bdf74ce
check reason for exception and improve error message, then disable ta…
friederici Feb 28, 2024
fadd42d
fix comment
friederici Mar 2, 2024
6bd513e
fix formatting
friederici Mar 2, 2024
89f1360
moved patchTaskMemory method
friederici Mar 2, 2024
647da6d
add tracing note in README
friederici Mar 2, 2024
49bebe7
reduce loglevel
friederici Mar 2, 2024
7ba49db
change predictor interface to return BigDecimal
friederici Mar 3, 2024
fc76c77
extracted constant for lowest memory request value
friederici Mar 3, 2024
5f6fb5e
add o.taskName to log, when available
friederici Mar 3, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,12 @@
<artifactId>jackson-annotations</artifactId>
</dependency>

<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-math3</artifactId>
<version>3.6.1</version>
</dependency>

</dependencies>

<build>
Expand Down
85 changes: 85 additions & 0 deletions src/main/java/cws/k8s/scheduler/memory/CombiPredictor.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
/*
* Copyright (c) 2023, Florian Friederici. All rights reserved.
*
* This code is free software: you can redistribute it and/or modify it under
* the terms of the GNU General Public License as published by the Free
* Software Foundation, either version 3 of the License, or (at your option)
* any later version.
*
* This code is distributed in the hope that it will be useful, but WITHOUT ANY
* WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
* FOR A PARTICULAR PURPOSE. See the GNU General Public License for more
* details.
*
* You should have received a copy of the GNU General Public License along with
* this work. If not, see <https://www.gnu.org/licenses/>.
*/

package cws.k8s.scheduler.memory;

import java.math.BigDecimal;

import cws.k8s.scheduler.model.Task;
import lombok.extern.slf4j.Slf4j;

//@formatter:off
/**
* CombiPredictor will combine predictions made by ConstantPredictor and
* LineraPredictor.
*
* LinearPredictor fails if there are no inputSize differences to tasks,
* ConstantPredictor can handle this case. So CombiPredictor will run both and
* decide dynamically which predictions to apply.
*
* @author Florian Friederici
*
*/
//@formatter:on
@Slf4j
public class CombiPredictor implements MemoryPredictor {

ConstantPredictor constantPredictor;
LinearPredictor linearPredictor;

public CombiPredictor() {
this.constantPredictor = new ConstantPredictor();
this.linearPredictor = new LinearPredictor();
}

@Override
public void addObservation(Observation o) {
log.debug("CombiPredictor.addObservation({})", o);
constantPredictor.addObservation(o);
linearPredictor.addObservation(o);
}

@Override
public String queryPrediction(Task task) {
String taskName = task.getConfig().getTask();
log.debug("CombiPredictor.queryPrediction({},{})", taskName, task.getInputSize());

String constantPrediction = constantPredictor.queryPrediction(task);
String linearPrediction = linearPredictor.queryPrediction(task);

if (constantPrediction==null && linearPrediction==null) {
// no prediction available at all
return null;
}

if (constantPrediction!=null && linearPrediction==null) {
// only the constantPrediction is available
return constantPrediction;
}

if (constantPrediction==null && linearPrediction!=null) {
// only the linearPrediction is available (unusual case)
return linearPrediction;
}

log.debug("constantPrediction={}, linearPrediction={}, difference={}", constantPrediction, linearPrediction, new BigDecimal(constantPrediction).subtract(new BigDecimal(linearPrediction)));

// prefer linearPrediction if both would be available
return linearPrediction;
}

}
98 changes: 98 additions & 0 deletions src/main/java/cws/k8s/scheduler/memory/ConstantPredictor.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
/*
* Copyright (c) 2023, Florian Friederici. All rights reserved.
*
* This code is free software: you can redistribute it and/or modify it under
* the terms of the GNU General Public License as published by the Free
* Software Foundation, either version 3 of the License, or (at your option)
* any later version.
*
* This code is distributed in the hope that it will be useful, but WITHOUT ANY
* WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
* FOR A PARTICULAR PURPOSE. See the GNU General Public License for more
* details.
*
* You should have received a copy of the GNU General Public License along with
* this work. If not, see <https://www.gnu.org/licenses/>.
*/

package cws.k8s.scheduler.memory;

import java.math.BigDecimal;
import java.math.RoundingMode;
import java.util.HashMap;
import java.util.Map;

import cws.k8s.scheduler.model.Task;
import lombok.extern.slf4j.Slf4j;

// @formatter:off
/**
* ConstantPredictor will use the following strategy:
*
* - In case task was successful:
* - let the next prediction be 10% higher, then the peakRss was
*
* - In case task has failed:
* - reset to initial value
*
* I.e. the suggestions from ConstantPredictor are not dependent on the input
* size of the tasks.
*
* @author Florian Friederici
*
*/
// @formatter:on
@Slf4j
class ConstantPredictor implements MemoryPredictor {

Map<String, BigDecimal> model;
Map<String, BigDecimal> initialValue;

public ConstantPredictor() {
model = new HashMap<>();
initialValue = new HashMap<>();
}

@Override
public void addObservation(Observation o) {
log.debug("ConstantPredictor.addObservation({})", o);
if (!TaskScaler.checkObservationSanity(o)) {
log.warn("dismiss observation {}", o);
return;
}

// store initial ramRequest value per task
if (!initialValue.containsKey(o.task)) {
initialValue.put(o.task, o.getRamRequest());
}

if (Boolean.TRUE.equals(o.success)) {
// set model to peakRss + 10%
if (model.containsKey(o.task)) {
model.replace(o.task, o.peakRss.multiply(new BigDecimal("1.1")).setScale(0, RoundingMode.CEILING));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not const always use the highest value ever seen? This would replace it with the most recent one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current behaviour is:

 * - In case task was successful:
 *   - let the next prediction be 10% higher, then the peakRss was 
 *
 * - In case task has failed:
 *   - reset to initial value

If the scheduler provides the tasks in order, with the tasks that has the biggest input first, this would cause the prediction to follow and always shrink.

But I agree that different "constant" strategies could be taken, e.g.:

  • constant, biggest value seen
  • constant, lowest value seen
  • constant, latest value seen <- current approach
    ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ordering it by size desc is only one of many possible scheduling strategies. Ordering could also be asc, random or FIFO. We should use maximum with x% offset.

} else {
model.put(o.task, o.peakRss.multiply(new BigDecimal("1.1")).setScale(0, RoundingMode.CEILING));
}
} else {
// reset to initialValue
if (model.containsKey(o.task)) {
model.replace(o.task, this.initialValue.get(o.task));
} else {
model.put(o.task, o.ramRequest.multiply(new BigDecimal(2)).setScale(0, RoundingMode.CEILING));
}
}

}

@Override
public String queryPrediction(Task task) {
String taskName = task.getConfig().getTask();
log.debug("ConstantPredictor.queryPrediction({})", taskName);

if (model.containsKey(taskName)) {
return model.get(taskName).toPlainString();
} else {
return null;
}
}
}
108 changes: 108 additions & 0 deletions src/main/java/cws/k8s/scheduler/memory/LinearPredictor.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
/*
* Copyright (c) 2023, Florian Friederici. All rights reserved.
*
* This code is free software: you can redistribute it and/or modify it under
* the terms of the GNU General Public License as published by the Free
* Software Foundation, either version 3 of the License, or (at your option)
* any later version.
*
* This code is distributed in the hope that it will be useful, but WITHOUT ANY
* WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
* FOR A PARTICULAR PURPOSE. See the GNU General Public License for more
* details.
*
* You should have received a copy of the GNU General Public License along with
* this work. If not, see <https://www.gnu.org/licenses/>.
*/

package cws.k8s.scheduler.memory;

import java.math.BigDecimal;
import java.math.RoundingMode;
import java.util.HashMap;
import java.util.Map;

import org.apache.commons.math3.stat.regression.SimpleRegression;

import cws.k8s.scheduler.model.Task;
import lombok.extern.slf4j.Slf4j;

//@formatter:off
/**
* LinearPredictor will use the following strategy:
*
* If there are less than 2 observations, give no prediction, else:
* Calculate linear regression model and provide predictions.
*
* Predictions start with 10% over-provisioning. If tasks fail, this will
* increase automatically.
*
* @author Florian Friederici
*
*/
//@formatter:on
@Slf4j
public class LinearPredictor implements MemoryPredictor {

Map<String, SimpleRegression> model;
Map<String, Double> overprovisioning;

public LinearPredictor() {
model = new HashMap<>();
overprovisioning = new HashMap<>();
}

@Override
public void addObservation(Observation o) {
log.debug("LinearPredictor.addObservation({})", o);
if (!TaskScaler.checkObservationSanity(o)) {
log.warn("dismiss observation {}", o);
return;
}

if (!overprovisioning.containsKey(o.task)) {
overprovisioning.put(o.task, 1.1);
}

if (Boolean.TRUE.equals(o.success)) {
if (!model.containsKey(o.task)) {
model.put(o.task, new SimpleRegression());
}

double x = o.getInputSize();
double y = o.getPeakRss().doubleValue();
model.get(o.task).addData(x,y);
} else {
log.debug("overprovisioning value will increase due to task failure");
Double old = overprovisioning.get(o.task);
overprovisioning.put(o.task, old+0.05);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this will increase at the beginning, where we might make a few wrong predictions. However, the overprovisioning is never decreased if we have more observations and maybe better predictions.
We can also leave this for the future as this is a very cautious approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overprovisioning was my first attempt in solving the problem of to low estimates.
Later I learned that errorstrategy is set to terminate by default and maxretries is very low (I belive this is 1 by default). So this will mostly do not occur, that this raises very much.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about a strategy that checks the predictions for any task in the training data and determines the highest offset needed to fit all/95%/99% values? The percent value could be a user defined hyper parameter.

}
}

@Override
public String queryPrediction(Task task) {
String taskName = task.getConfig().getTask();
log.debug("LinearPredictor.queryPrediction({},{})", taskName, task.getInputSize());

if (!model.containsKey(taskName)) {
log.debug("LinearPredictor has no model for {}", taskName);
return null;
}

SimpleRegression simpleRegression = model.get(taskName);
double prediction = simpleRegression.predict(task.getInputSize());

if (Double.isNaN(prediction)) {
log.debug("No prediction possible for {}", taskName);
return null;
}

if (prediction < 0) {
log.warn("prediction would be negative: {}", prediction);
return null;
}

return BigDecimal.valueOf(prediction).multiply(BigDecimal.valueOf(overprovisioning.get(taskName))).setScale(0, RoundingMode.CEILING).toPlainString();
Lehmann-Fabian marked this conversation as resolved.
Show resolved Hide resolved
}

}
61 changes: 61 additions & 0 deletions src/main/java/cws/k8s/scheduler/memory/MemoryPredictor.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
/*
* Copyright (c) 2023, Florian Friederici. All rights reserved.
*
* This code is free software: you can redistribute it and/or modify it under
* the terms of the GNU General Public License as published by the Free
* Software Foundation, either version 3 of the License, or (at your option)
* any later version.
*
* This code is distributed in the hope that it will be useful, but WITHOUT ANY
* WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
* FOR A PARTICULAR PURPOSE. See the GNU General Public License for more
* details.
*
* You should have received a copy of the GNU General Public License along with
* this work. If not, see <https://www.gnu.org/licenses/>.
*/

package cws.k8s.scheduler.memory;

import cws.k8s.scheduler.model.Task;

// @formatter:off
/**
* The MemoryPredictor has two important interfaces:
*
* 1) addObservation()
* - "add a new observation" after a workflow task is finished, the
* observation result will be collected in the MemoryPredictor
*
* 2) queryPrediction()
* - "ask for a suggestion" at any time, the MemoryPredictor can be asked
* what its guess is on the resource requirement of a task
*
* Different strategies can be tried and exchanged easily, they just have to
* implement those two interfaces. See ConstantPredictor and LinearPredictor
* for concrete strategies.
*
* @author Florian Friederici
*
*/
// @formatter:on
interface MemoryPredictor {

/**
* input observation into the MemoryPredictor, to be used to learn memory usage
* of tasks to create suggestions
*
* @param o the observation that was made
*/
void addObservation(Observation o);

/**
* ask the MemoryPredictor for a suggestion on how much memory should be
* assigned to the task.
*
* @param task the task to get a suggestion form
* @return null, if no suggestion possible, otherwise the value to be used
*/
String queryPrediction(Task task);
Lehmann-Fabian marked this conversation as resolved.
Show resolved Hide resolved

}
Loading
Loading