Enhance `predict` API to serve for env validation purpose. #10759

B-Step62 · 2023-12-27T11:51:20Z

🛠 DevTools 🛠

Install mlflow from this PR

pip install git+https://github.com/mlflow/mlflow.git@refs/pull/10759/merge

Checkout with GitHub CLI

gh pr checkout 10759

What changes are proposed in this pull request?

Enhance mlflow models predict CLI command so that users can use it (more easily) for validating model environment before deployment.

Add Python API for convenience in the notebook environment. Almost same as CLI, but also support serialized json/csv string as input data (while CLI only supports file path or stdin) for the sake of notebook experience.
Show better error message and guidance when some packages are missing in the model. Basically introduce usage of extra_pip_requirements.
Add pip-requirement-override argument to both Python/CLI APIs, so that users can try additional/updated dependencies without having to create and log model (as Python only emits error one by one, this process can be very annoying otherwise).

Note
Will work on OSS/Databricks docs as a follow-up.

How is this PR tested?

Existing unit/integration tests
New unit/integration tests
Manual tests

Verified the functionality works in Databricks. Note that it takes a bit log for the first time creating virtualenv (0:20~0:45 in the video).

Screen.Recording.2023-12-27.at.20.42.35.mov

In Databricks, I was able to test only with virtualenv as we don't have conda installed. For conda, I tested on devbox and it worked as same.

Does this PR require documentation update?

Will work on doc update.

Release Notes

Is this a user-facing change?

No. You can skip the rest of this section.
Yes. Give a description of this change to be included in the release notes for MLflow users.

Enhanced predict API for MLflow Model so it can be used for inference environment validation before model deployment: (1) add Python API for the sake of notebook convenience (2) introduce pip-requirement-overrides argument to test dependency change (3) enrich error message.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

Interface

area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

Language

language/r: R APIs and clients
language/java: Java APIs and clients
language/new: Proposals for new client languages

Integrations

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations
integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
rn/feature - A new user-facing feature worth mentioning in the release notes
rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
rn/documentation - A user-facing documentation change worth mentioning in the release notes

github-actions · 2023-12-27T11:51:38Z

Documentation preview for 7d31b19 will be available here when this CircleCI job completes successfully.

More info

Ignore this comment if this PR does not change the documentation.
It takes a few minutes for the preview to be available.
The preview is updated when a new commit is pushed to this PR.
This comment was created by https://github.com/mlflow/mlflow/actions/runs/7458297955.

mlflow/pyfunc/backend.py

mlflow/models/__init__.py

mlflow/pyfunc/_mlflow_pyfunc_backend_predict.py

mlflow/pyfunc/backend.py

mlflow/models/__init__.py

B-Step62 · 2023-12-28T15:39:13Z

mlflow/models/python_api.py

+from mlflow.utils.file_utils import TempDir
+
+
+def build_docker(


build_docker has no change from the original definition in mlflow/models/__init__.py. Just moved to avoid complicating __init__.py.

mlflow/models/__init__.py

B-Step62 · 2023-12-29T01:29:40Z

mlflow/models/python_api.py

+    model_uri: str,
+    # TODO: This is currently subset of PyfuncInput, ideally we should cover all
+    input_data: Union[str, Dict[str, Any], List[Any], "pd.DataFrame", None] = None,  # noqa: F821
+    input_path: Optional[str] = None,


Separated input_path and input_data arguments, as pandas read_csv combined with StringIO is too permissive. For exampl,e pd.read_csv(StringIO("some-incorrect-file-path.csv")) will be read as single column DF (while we want to say file not exists).

Good idea; the API makes sense and is clear.

B-Step62 · 2023-12-29T01:30:37Z

mlflow/models/python_api.py

+        raise MlflowException.invalid_parameter_value(
+            "Both input_data and input_path are provided. Only one of them should be specified."
+        )
+    elif input_data is not None:


Not elif input_data: as this can be pandas dataframe, which doesn't allow casting to single boolean.

B-Step62 · 2023-12-29T02:24:32Z

mlflow/utils/cli_args.py

        type=click.UNPROCESSED,
        callback=_resolve_env_manager,
        help=help_string,
    )


 ENV_MANAGER = _create_env_manager_option(
+    default=_EnvManager.VIRTUALENV,


Every cli command was using this default (by doing env_manager = env_manager or _EnvMangager.VIRTUALENV)

mlflow/pyfunc/_mlflow_pyfunc_backend_predict.py

BenWilson2

Nice implementation!

docs/source/deployment/index.rst

mlflow/models/python_api.py

tests/models/test_python_api.py

mlflow/models/cli.py

docs/source/deployment/index.rst

Signed-off-by: B-Step62 <[email protected]>

Co-authored-by: Ben Wilson <[email protected]> Signed-off-by: Yuki Watanabe <[email protected]>

Signed-off-by: B-Step62 <[email protected]>

harupy

LGTM :)

B-Step62 requested a review from dbczumar December 27, 2023 11:51

github-actions bot added area/models MLmodel format, model serialization/deserialization, flavors rn/feature Mention under Features in Changelogs. labels Dec 27, 2023

B-Step62 mentioned this pull request Dec 27, 2023

[WIP] API for validating inference environment #10738

Closed

37 tasks

B-Step62 commented Dec 27, 2023

View reviewed changes

mlflow/pyfunc/backend.py Outdated Show resolved Hide resolved

B-Step62 requested a review from harupy December 27, 2023 15:38

dbczumar reviewed Dec 28, 2023

View reviewed changes

mlflow/models/__init__.py Outdated Show resolved Hide resolved

dbczumar reviewed Dec 28, 2023

View reviewed changes

mlflow/models/__init__.py Outdated Show resolved Hide resolved

dbczumar reviewed Dec 28, 2023

View reviewed changes

mlflow/models/__init__.py Outdated Show resolved Hide resolved

dbczumar reviewed Dec 28, 2023

View reviewed changes

mlflow/pyfunc/_mlflow_pyfunc_backend_predict.py Outdated Show resolved Hide resolved

dbczumar reviewed Dec 28, 2023

View reviewed changes

mlflow/pyfunc/backend.py Outdated Show resolved Hide resolved

dbczumar reviewed Dec 28, 2023

View reviewed changes

mlflow/models/__init__.py Outdated Show resolved Hide resolved

B-Step62 commented Dec 28, 2023

View reviewed changes

mlflow/models/__init__.py Show resolved Hide resolved

B-Step62 commented Dec 29, 2023

View reviewed changes

B-Step62 requested a review from dbczumar December 29, 2023 02:24

BenWilson2 reviewed Jan 5, 2024

View reviewed changes

mlflow/pyfunc/_mlflow_pyfunc_backend_predict.py Outdated Show resolved Hide resolved

BenWilson2 reviewed Jan 5, 2024

View reviewed changes

mlflow/pyfunc/_mlflow_pyfunc_backend_predict.py Outdated Show resolved Hide resolved

BenWilson2 approved these changes Jan 5, 2024

View reviewed changes