Skip to content

Commit

Permalink
add example
Browse files Browse the repository at this point in the history
Signed-off-by: Daniel Lok <[email protected]>
  • Loading branch information
daniellok-db committed Nov 15, 2023
1 parent 5e34600 commit 1b57f81
Show file tree
Hide file tree
Showing 3 changed files with 52 additions and 2 deletions.
Binary file added docs/source/_static/images/evaluate_metrics.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 6 additions & 0 deletions docs/source/llms/llm-evaluate/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,8 @@ There are two ways to select metrics to evaluate your model:
* Use **default** metrics for pre-defined model types.
* Use a **custom** list of metrics.

.. _llm-eval-default-metrics:

Use Default Metrics for Pre-defined Model Types
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Expand Down Expand Up @@ -173,6 +175,8 @@ The supported LLM model types and associated metrics are listed below:
:sup:`3` Requires package `evaluate <https://pypi.org/project/evaluate>`_, `nltk <https://pypi.org/project/nltk>`_, and
`rouge-score <https://pypi.org/project/rouge-score>`_

.. _llm-eval-custom-metrics:

Use a Custom List of Metrics
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Expand Down Expand Up @@ -441,6 +445,8 @@ up OpenAI authentication to run the code below.
model_type="question-answering",
)
.. _llm-eval-static-dataset:

Evaluating with a Static Dataset
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Expand Down
48 changes: 46 additions & 2 deletions docs/source/llms/prompt-engineering/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -296,9 +296,53 @@ or you can :ref:`deploy it for real-time serving <deploy-prompt-serving>`.
Step 13: Perform metric-based evaluation of your model's outputs
----------------------------------------------------------------
If you'd like to assess your model's performance on specific metrics, MLflow provides the :py:func:`mlflow.evaluate()`
API.
API. Let's evaluate our model on some :ref:`pre-defined metrics <llm-eval-default-metrics>`
for text summarization:

You can learn more about LLM evaluation at the :ref:`llm-eval` page.
.. code-block:: python
import mlflow
import pandas as pd
logged_model = "runs:/840a5c43f3fb46f2a2059b761557c1d0/model"
article_text = """
An MLflow Project is a format for packaging data science code in a reusable and reproducible way.
The MLflow Projects component includes an API and command-line tools for running projects, which
also integrate with the Tracking component to automatically record the parameters and git commit
of your source code for reproducibility.
This article describes the format of an MLflow Project and how to run an MLflow project remotely
using the MLflow CLI, which makes it easy to vertically scale your data science code.
"""
question = "What is an MLflow project?"
data = pd.DataFrame({
"article": [article_text],
"question": [question],
"ground_truth": [article_text], # used for certain evaluation metrics, such as ROUGE score
})
with mlflow.start_run():
results = mlflow.evaluate(
model=logged_model,
data=data,
targets="ground_truth",
model_type="text-summarization",
)
eval_table = results.tables["eval_results_table"]
print(f"See evaluation table below: \n{eval_table}")
The evaluation results can also be viewed in the MLflow Evaluation UI:

.. figure:: ../../_static/images/evaluate_metrics.png
:scale: 40%
:align: center

The :py:func:`mlflow.evaluate()` API also supports :ref:`custom metrics <llm-eval-custom-metrics>`,
:ref:`static dataset evaluation <llm-eval-static-dataset>`, and much more. For a
more in-depth guide, see :ref:`llm-eval`.

.. _deploy-prompt-serving:

Expand Down

0 comments on commit 1b57f81

Please sign in to comment.