add example

Signed-off-by: Daniel Lok <[email protected]>
daniellok-db · Nov 15, 2023 · 1b57f81 · 1b57f81
1 parent 5e34600
commit 1b57f81
Show file tree

Hide file tree

Showing 3 changed files with 52 additions and 2 deletions.
diff --git a/docs/source/_static/images/evaluate_metrics.png b/docs/source/_static/images/evaluate_metrics.png
diff --git a/docs/source/llms/llm-evaluate/index.rst b/docs/source/llms/llm-evaluate/index.rst
@@ -125,6 +125,8 @@ There are two ways to select metrics to evaluate your model:
 * Use **default** metrics for pre-defined model types.
 * Use a **custom** list of metrics.
 
+.. _llm-eval-default-metrics:
+
 Use Default Metrics for Pre-defined Model Types
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
@@ -173,6 +175,8 @@ The supported LLM model types and associated metrics are listed below:
 :sup:`3` Requires package `evaluate <https://pypi.org/project/evaluate>`_, `nltk <https://pypi.org/project/nltk>`_, and 
 `rouge-score <https://pypi.org/project/rouge-score>`_
 
+.. _llm-eval-custom-metrics:
+
 Use a Custom List of Metrics
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
@@ -441,6 +445,8 @@ up OpenAI authentication to run the code below.
             model_type="question-answering",
         )
 
+.. _llm-eval-static-dataset:
+
 Evaluating with a Static Dataset
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 

diff --git a/docs/source/llms/prompt-engineering/index.rst b/docs/source/llms/prompt-engineering/index.rst
@@ -296,9 +296,53 @@ or you can :ref:`deploy it for real-time serving <deploy-prompt-serving>`.
 Step 13: Perform metric-based evaluation of your model's outputs
 ----------------------------------------------------------------
 If you'd like to assess your model's performance on specific metrics, MLflow provides the :py:func:`mlflow.evaluate()`
-API. 
+API. Let's evaluate our model on some :ref:`pre-defined metrics <llm-eval-default-metrics>` 
+for text summarization:
 
-You can learn more about LLM evaluation at the :ref:`llm-eval` page.
+  .. code-block:: python
+
+   import mlflow
+   import pandas as pd
+
+   logged_model = "runs:/840a5c43f3fb46f2a2059b761557c1d0/model"
+
+   article_text = """
+   An MLflow Project is a format for packaging data science code in a reusable and reproducible way.
+   The MLflow Projects component includes an API and command-line tools for running projects, which
+   also integrate with the Tracking component to automatically record the parameters and git commit
+   of your source code for reproducibility.
+
+   This article describes the format of an MLflow Project and how to run an MLflow project remotely
+   using the MLflow CLI, which makes it easy to vertically scale your data science code.
+   """
+   question = "What is an MLflow project?"
+
+   data = pd.DataFrame({
+      "article": [article_text],
+      "question": [question],
+      "ground_truth": [article_text],  # used for certain evaluation metrics, such as ROUGE score
+   })
+
+   with mlflow.start_run():
+      results = mlflow.evaluate(
+         model=logged_model,
+         data=data,
+         targets="ground_truth",
+         model_type="text-summarization",
+      )
+
+   eval_table = results.tables["eval_results_table"]
+   print(f"See evaluation table below: \n{eval_table}")
+
+The evaluation results can also be viewed in the MLflow Evaluation UI:
+
+   .. figure:: ../../_static/images/evaluate_metrics.png
+      :scale: 40%
+      :align: center
+
+The :py:func:`mlflow.evaluate()` API also supports :ref:`custom metrics <llm-eval-custom-metrics>`,
+:ref:`static dataset evaluation <llm-eval-static-dataset>`, and much more. For a
+more in-depth guide, see :ref:`llm-eval`.
 
 .. _deploy-prompt-serving: