-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds tagging for sagemaker endpoints and sagemaker config. Issue #9159 #9310
Conversation
Documentation preview for 7acc172 will be available here when this CircleCI job completes successfully. More info
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
Overall looks good to me! only dropped some comments on the style.
Btw, could you help edit the title to include the context of this PR? It would be easier to track in the history (after squash and merge).
mlflow/sagemaker/__init__.py
Outdated
def _get_sagemaker_config_tags(endpoint_name): | ||
return [{"Key": "app_name", "Value": endpoint_name}] | ||
|
||
def _add_sagemaker_tags(tags, config_tags): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The args are little bit confusing, it's hard to tell the diff between tags
and config_tags
. We can either add a docstring to explain or use a more explicit name, e.g., tags
and app_name_tag
.
mlflow/sagemaker/__init__.py
Outdated
|
||
def _add_sagemaker_tags(tags, config_tags): | ||
""" | ||
Convert dict of tags to list for SageMaker and adds to config tags list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: period at the end
tests/sagemaker/mock/__init__.py
Outdated
model = next(model for model in self.models.values() if model.arn == resource_arn) | ||
return model.resource.tags | ||
resource_values = getattr(self, resource_type).values() | ||
sagemaker_resource = next( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually if we need to break inline code to multiple lines, it indicates we should replace the inline style with the normal for loop for readability. could you help refactor? thanks!
mlflow/sagemaker/__init__.py
Outdated
""" | ||
if sagemaker_tags: | ||
sagemaker_tags = [{"Key": key, "Value": str(value)} for key, value in sagemaker_tags.items()] | ||
config_tags.extend(sagemaker_tags) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this line do?
The return fo the private function is sagemaker_tags
, which is handled entirely by lines 1349 and 1350.
Was the intention to return config_tags ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_prepare_sagemaker_tags({"a_config_tag": "a_value", "my_custom_tag": "my_custom_value"}, [{"Key": "app_name", "Value": "my_app"}])
[{'Key': 'a_config_tag', 'Value': 'a_value'},
{'Key': 'my_custom_tag', 'Value': 'my_custom_value'}]
If this is the intended output, then there is no need to have config_tags
(which is the structure [{"Key": "app_name", "Value": <endpoint_name>}]
defined and returned within _get_sagemaker_config_tags
)
mlflow/sagemaker/__init__.py
Outdated
return [{"Key": "app_name", "Value": endpoint_name}] | ||
|
||
def _prepare_sagemaker_tags( | ||
sagemaker_tags: Dict[str, str], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if the dict sagemaker_tags
contains the key "app_name"
? What does AWS do with multiple conflicting tags?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, looks like the request would be rejected according to this documentation. I'll add a check for that.
mlflow/sagemaker/__init__.py
Outdated
sagemaker_tags = [{"Key": key, "Value": str(value)} for key, value in sagemaker_tags.items()] | ||
config_tags.extend(sagemaker_tags) | ||
|
||
return sagemaker_tags |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would something like this perhaps be a little bit more robust?
from mlflow import MlflowException
from typing import List, Dict, Optional
SAGEMAKER_APP_NAME_TAG_KEY = "app_name"
def _get_sagemaker_config_tags(endpoint_name):
return [{"Key": SAGEMAKER_APP_NAME_TAG_KEY, "Value": endpoint_name}]
def _prepare_sagemaker_tags(
config_tags: List[Dict[str, str]],
sagemaker_tags: Optional[Dict[str, str]]=None,
):
if not sagemaker_tags:
return config_tags
if SAGEMAKER_APP_NAME_TAG_KEY in sagemaker_tags:
raise MlflowException.invalid_parameter_value(f"Duplicate tag provided for '{SAGEMAKER_APP_NAME_TAG_KEY}'")
parsed = [{"Key": key, "Value": str(value)} for key, value in sagemaker_tags.items()]
return config_tags + parsed
Validating the behavior:
_prepare_sagemaker_tags([{"Key": "app_name", "Value": "my_app"}], {"a": "1", "b": "2", "c": "3"})
[{'Key': 'app_name', 'Value': 'my_app'},
{'Key': 'a', 'Value': '1'},
{'Key': 'b', 'Value': '2'},
{'Key': 'c', 'Value': '3'}]
_prepare_sagemaker_tags([{"Key": "app_name", "Value": "my_app"}], {"app_name": "my_better_app", "b": "2", "c": "3"})
---------------------------------------------------------------------------
MlflowException Traceback (most recent call last)
/var/folders/cd/n8n0rm2x53l_s0xv_j_xklb00000gp/T/ipykernel_12446/3588563336.py in <cell line: 1>()
----> 1 _prepare_sagemaker_tags([{"Key": "app_name", "Value": "my_app"}], {"app_name": "my_better_app", "b": "2", "c": "3"})
/var/folders/cd/n8n0rm2x53l_s0xv_j_xklb00000gp/T/ipykernel_12446/3590447905.py in _prepare_sagemaker_tags(config_tags, sagemaker_tags)
14
15 if SAGEMAKER_APP_NAME_TAG_KEY in sagemaker_tags:
---> 16 raise MlflowException.invalid_parameter_value(f"Duplicate tag provided for '{SAGEMAKER_APP_NAME_TAG_KEY}'")
17 parsed = [{"Key": key, "Value": str(value)} for key, value in sagemaker_tags.items()]
18
MlflowException: Duplicate tag provided for 'app_name'
_prepare_sagemaker_tags([{"Key": "app_name", "Value": "my_app"}], None)
_prepare_sagemaker_tags([{"Key": "app_name", "Value": "my_app"}], {})
[{'Key': 'app_name', 'Value': 'my_app'}]
Does this reflect the intended functionality for the _prepare_sagemaker_tags()
function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, this looks great.
tests/sagemaker/mock/__init__.py
Outdated
if "model" in arn: | ||
sagemaker_resource = "models" | ||
elif "endpoint" in arn: | ||
sagemaker_resource = "endpoints" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be modified to a ternary operator.
model = next(model for model in self.models.values() if model.arn == resource_arn) | ||
return model.resource.tags | ||
resource_values = getattr(self, resource_type).values() | ||
for sagemaker_resource in resource_values: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this being modified to a for loop? is self.resource_type.values()
not accessible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not accessible because resource_type
is not a valid attribute name. It's a variable containing a string value which is why line 526 was added. The original inline code was broken out to a for loop based on a previous review to improve readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add tests that validate the correctness of supplied tags and that parsing is performed correctly for the conversion from dict -> AWS-specified List[Dict[str, str]] format. The current implementation seems like it is dropping data.
Hi @clarkh-ncino could you rebase to master to address the merge conflicts (and then I can initiate the CI suite for you) |
Signed-off-by: Clark Hollar <[email protected]>
Signed-off-by: Clark Hollar <[email protected]>
Signed-off-by: Clark Hollar <[email protected]>
Signed-off-by: Clark Hollar <[email protected]>
This reverts commit 3225a11. Signed-off-by: Clark Hollar <[email protected]>
…or loop for readability Signed-off-by: Clark Hollar <[email protected]>
Signed-off-by: Clark Hollar <[email protected]>
Signed-off-by: Clark Hollar <[email protected]>
Signed-off-by: Clark Hollar <[email protected]>
Should be good to go whenever you have a moment. 👍 @BenWilson2 |
@BenWilson2 @chenmoneygithub anything I can do here to help move this forward? Think we just need to kick of the CI build. |
@clarkh-ncino could you rebase to master again? The issues that you're encountering should be resolved (you'll also need to fix those lint failures, though with import ordering and the black formatting issues). If you run the pre-commit hook ( |
Signed-off-by: Clark Hollar <[email protected]>
fixing lint issues and removing unused variable
@BenWilson2 Should be good to go now 🤞 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
…ow#9159 (mlflow#9310) Signed-off-by: Clark Hollar <[email protected]>
…ow#9159 (mlflow#9310) Signed-off-by: Clark Hollar <[email protected]>
Related Issues/PRs
Resolve #9159
What changes are proposed in this pull request?
Fixing a bug where tags were not being added to sagemaker endpoints during deployment. Also added logic to add custom tags to sagemaker config resources as well.
How is this patch tested?
Does this PR change the documentation?
Release Notes
Is this a user-facing change?
(Details in 1-2 sentences. You can just refer to another PR with a description if this PR is part of a larger change.)
Users of the SageMaker deployment functionality can now configure custom tags for both endpoints and endpoint configuration resources from both the CLI and Python SDK.
What component(s), interfaces, languages, and integrations does this PR affect?
Components
area/artifacts
: Artifact stores and artifact loggingarea/build
: Build and test infrastructure for MLflowarea/docs
: MLflow documentation pagesarea/examples
: Example codearea/gateway
: AI Gateway service, Gateway client APIs, third-party Gateway integrationsarea/model-registry
: Model Registry service, APIs, and the fluent client calls for Model Registryarea/models
: MLmodel format, model serialization/deserialization, flavorsarea/recipes
: Recipes, Recipe APIs, Recipe configs, Recipe Templatesarea/projects
: MLproject format, project running backendsarea/scoring
: MLflow Model server, model deployment tools, Spark UDFsarea/server-infra
: MLflow Tracking server backendarea/tracking
: Tracking Service, tracking client APIs, autologgingInterface
area/uiux
: Front-end, user experience, plotting, JavaScript, JavaScript dev serverarea/docker
: Docker use across MLflow's components, such as MLflow Projects and MLflow Modelsarea/sqlalchemy
: Use of SQLAlchemy in the Tracking Service or Model Registryarea/windows
: Windows supportLanguage
language/r
: R APIs and clientslanguage/java
: Java APIs and clientslanguage/new
: Proposals for new client languagesIntegrations
integrations/azure
: Azure and Azure ML integrationsintegrations/sagemaker
: SageMaker integrationsintegrations/databricks
: Databricks integrationsHow should the PR be classified in the release notes? Choose one:
rn/breaking-change
- The PR will be mentioned in the "Breaking Changes" sectionrn/none
- No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" sectionrn/feature
- A new user-facing feature worth mentioning in the release notesrn/bug-fix
- A user-facing bug fix worth mentioning in the release notesrn/documentation
- A user-facing documentation change worth mentioning in the release notes