Skip to content
This repository has been archived by the owner on Oct 9, 2023. It is now read-only.

Ensure admin never sets task resource request > limit #126

Merged
merged 4 commits into from
Sep 22, 2020
Merged

Conversation

katrogan
Copy link
Contributor

@katrogan katrogan commented Sep 21, 2020

TL;DR

In the case where a user specifies a task resource limit but not the request value, admin will substitute the request value with the platform-configured default. In some cases this default can exceed the user-specified limit which is non-sensical and prevents kubernetes from ever scheduling the task.

Type

  • Bug Fix
  • Feature
  • Plugin

Are all requirements met?

  • Code completed
  • Smoke tested
  • Unit tests added
  • Code documentation added
  • Any pending items have an associated Issue

Complete description

Fixes user-reported bug.

Tracking Issue

flyteorg/flyte#264

Follow-up issue

NA

@codecov-commenter
Copy link

codecov-commenter commented Sep 21, 2020

Codecov Report

Merging #126 into master will decrease coverage by 0.99%.
The diff coverage is 80.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #126      +/-   ##
==========================================
- Coverage   62.26%   61.27%   -1.00%     
==========================================
  Files         105      105              
  Lines        7845     7267     -578     
==========================================
- Hits         4885     4453     -432     
+ Misses       2385     2247     -138     
+ Partials      575      567       -8     
Flag Coverage Δ
#unittests 61.27% <80.00%> (-1.00%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/manager/impl/execution_manager.go 68.57% <80.00%> (-0.64%) ⬇️
pkg/config/config.go 16.66% <0.00%> (-10.61%) ⬇️
pkg/manager/impl/util/digests.go 28.57% <0.00%> (-8.93%) ⬇️
pkg/runtime/whitelist_provider.go 16.66% <0.00%> (-8.34%) ⬇️
pkg/rpc/config/flyte_client.go 50.00% <0.00%> (-7.15%) ⬇️
pkg/runtime/cluster_resource_provider.go 23.80% <0.00%> (-6.96%) ⬇️
pkg/runtime/cluster_config_provider.go 43.75% <0.00%> (-6.25%) ⬇️
pkg/runtime/namespace_config_provider.go 8.33% <0.00%> (-5.96%) ⬇️
pkg/runtime/task_resource_provider.go 11.11% <0.00%> (-5.56%) ⬇️
pkg/runtime/execution_queue_provider.go 11.11% <0.00%> (-5.56%) ⬇️
... and 95 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fab4226...68dfa9b. Read the comment docs.

@@ -263,6 +265,41 @@ func assignResourcesIfUnset(ctx context.Context, identifier *core.Identifier,
return resourceEntries
}

func resolveTaskLimitsAndPlatformRequestDefaults(ctx context.Context, identifier *core.Identifier,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the code updating the limits (as the name suggests?) am I missing something?

Copy link
Contributor Author

@katrogan katrogan Sep 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, I just struggled with naming. is considerTaskLimitsAndPlatformRequestDefaults or reconcileTaskLimitsAndPlatformRequestDefaults any better? other suggestions welcomed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the limit is not supposed to change though... the request is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay updated

wild-endeavor
wild-endeavor previously approved these changes Sep 21, 2020
Copy link
Contributor

@wild-endeavor wild-endeavor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just add a comment to the function to clear up confusion?

@katrogan
Copy link
Contributor Author

friendly ping @EngHabu @wild-endeavor @bnsblue

}
if quantity.Cmp(resource.MustParse(limitValue)) == 1 {
// The quantity is greater than the limit! Course correct below.
logger.Debugf(ctx, "Updating requested value for task [%+v] resource [%s]. Overriding to [%s] from [%s]",
Copy link

@bnsblue bnsblue Sep 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can you add what causes this override to the log so that it's clearer when reading the logs?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I feel the debug level should be warning/info? Not quite sure 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated & upgraded

},
},
}, resources))
})
Copy link

@bnsblue bnsblue Sep 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test that tests the logic here? https://github.com/lyft/flyteadmin/pull/126/files#diff-fc047e54b9dd82ca7c89ac9b32cb07b3R282-R288
Specifically, can you add a test where one of the resource does not have the limit specified?

Copy link
Contributor Author

@katrogan katrogan Sep 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at the stage where the method is called all values will have already been substituted with the application defaults. (https://github.com/lyft/flyteadmin/pull/126/files#diff-fc047e54b9dd82ca7c89ac9b32cb07b3R342) i only added that check as a defensive safeguard

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@katrogan katrogan Sep 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unless the user is specifying a resource we don't provide defaults for. but in that case we don't find ourselves in the original conundrum where the substituted request > limit since there will be no request default value which leads to a permissible situation for scheduling

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a unit-test point of view I think that it would still be nice to have a test for it since we don't know if in the future this function would be invoked through other code paths where the default limits were not pre-filled. But as you said it is not critical here.

@bnsblue
Copy link

bnsblue commented Sep 22, 2020

LGTM. Just some nits.

@katrogan katrogan merged commit 0fc9dbd into master Sep 22, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants