-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only include necessary information in prompt for GenAI metrics #10698
Conversation
Documentation preview for b4e12c7 will be available here when this CircleCI job completes successfully. More info
|
Signed-off-by: Roshni Malani <[email protected]>
Signed-off-by: Roshni Malani <[email protected]>
737d440
to
f8eda62
Compare
@prithvikannan @annzhang-db Ready for review. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dbczumar Can you help review this as well. Wanted to confirm on the API changes here and see if those makes sense or the suggestions made have more opinions.
Thank you @rmalani-db for working through this, sorry for the back and forth
mlflow/metrics/genai/genai_metric.py
Outdated
@@ -229,6 +231,7 @@ def process_example(example): | |||
name, | |||
definition, | |||
grading_prompt, | |||
include_input, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On line 243 in eval_fn as args, should we remove inputs as a required field that is passed to the eval_fn?
That also means that in the signature it is optional. Thoughts?
mlflow/metrics/genai/base.py
Outdated
@@ -72,17 +72,23 @@ def _format_grading_context(self): | |||
else: | |||
return self.grading_context | |||
|
|||
def __str__(self) -> str: | |||
def print(self, include_input: bool = True) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we are using include_input
in example rather than just not passing in input? Basically making input as optional here? Sorry maybe I missed this decision :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can make input
be optional here in EvaluationExample
, but users can still provide it (in the case they are sharing examples between different metrics). We should still exclude the input if the metric excludes it. I believe that's the decision in Option C in the linked Jira ticket.
mlflow/metrics/genai/prompts/v1.py
Outdated
@@ -64,6 +84,7 @@ class EvaluationModel: | |||
name: str | |||
definition: str | |||
grading_prompt: str | |||
include_input: bool = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to change this here as well? If we just don't pass in input to the evaluation_model it doesn't render it. We can do it similar to grading_context_columns
. Basically we can try and avoid adding more APIs.
Signed-off-by: Roshni Malani <[email protected]>
66bfa85
to
0f509bd
Compare
Signed-off-by: Roshni Malani <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to simplify the API surface affected by the optional include_input
parameter. In the process, I also simplified how PromptTemplate
works to allow for optional variables that are None. Please take another look @sunishsheth2009, thanks.
mlflow/metrics/genai/base.py
Outdated
@@ -72,17 +72,23 @@ def _format_grading_context(self): | |||
else: | |||
return self.grading_context | |||
|
|||
def __str__(self) -> str: | |||
def print(self, include_input: bool = True) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can make input
be optional here in EvaluationExample
, but users can still provide it (in the case they are sharing examples between different metrics). We should still exclude the input if the metric excludes it. I believe that's the decision in Option C in the linked Jira ticket.
@@ -280,7 +290,9 @@ def eval_fn( | |||
) | |||
grading_payloads.append( | |||
evaluation_context["eval_prompt"].format( | |||
input=input, output=output, grading_context_columns=arg_string | |||
input=(input if include_input else None), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now the eval context doesn't need to have an explicit include_input
parameter.
…w#10698) Signed-off-by: Roshni Malani <[email protected]> Co-authored-by: Roshni Malani <[email protected]>
…w#10698) Signed-off-by: Roshni Malani <[email protected]> Co-authored-by: Roshni Malani <[email protected]>
Related Issues/PRs
What changes are proposed in this pull request?
Add an
include_input
parameter to themake_genai_metric()
method to allow custom metrics that only need output and context to exclude the input, allowing for a shorter prompt with only relevant information. The parameter defaults toTrue
for backward compatibility, and users can specifyFalse
to exclude the input.How is this PR tested?
Does this PR require documentation update?
Release Notes
Is this a user-facing change?
What component(s), interfaces, languages, and integrations does this PR affect?
Components
area/artifacts
: Artifact stores and artifact loggingarea/build
: Build and test infrastructure for MLflowarea/deployments
: MLflow Deployments client APIs, server, and third-party Deployments integrationsarea/docs
: MLflow documentation pagesarea/examples
: Example codearea/model-registry
: Model Registry service, APIs, and the fluent client calls for Model Registryarea/models
: MLmodel format, model serialization/deserialization, flavorsarea/recipes
: Recipes, Recipe APIs, Recipe configs, Recipe Templatesarea/projects
: MLproject format, project running backendsarea/scoring
: MLflow Model server, model deployment tools, Spark UDFsarea/server-infra
: MLflow Tracking server backendarea/tracking
: Tracking Service, tracking client APIs, autologgingInterface
area/uiux
: Front-end, user experience, plotting, JavaScript, JavaScript dev serverarea/docker
: Docker use across MLflow's components, such as MLflow Projects and MLflow Modelsarea/sqlalchemy
: Use of SQLAlchemy in the Tracking Service or Model Registryarea/windows
: Windows supportLanguage
language/r
: R APIs and clientslanguage/java
: Java APIs and clientslanguage/new
: Proposals for new client languagesIntegrations
integrations/azure
: Azure and Azure ML integrationsintegrations/sagemaker
: SageMaker integrationsintegrations/databricks
: Databricks integrationsHow should the PR be classified in the release notes? Choose one:
rn/none
- No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" sectionrn/breaking-change
- The PR will be mentioned in the "Breaking Changes" sectionrn/feature
- A new user-facing feature worth mentioning in the release notesrn/bug-fix
- A user-facing bug fix worth mentioning in the release notesrn/documentation
- A user-facing documentation change worth mentioning in the release notes