huggingface · qgallouedec · Sep 8, 2024 · Sep 6, 2024 · Sep 7, 2024 · Sep 7, 2024
diff --git a/docs/source/kto_trainer.mdx b/docs/source/kto_trainer.mdx
@@ -62,7 +62,7 @@ For a detailed example have a look at the `examples/scripts/kto.py` script. At a
 The `beta` refers to the hyperparameter of the implicit reward, and the dataset contains the 3 entries listed above. Note that the `model` and `ref_model` need to have the same architecture (ie decoder only or encoder-decoder).
 
 The `desirable_weight` and `undesirable_weight` refer to the weights placed on the losses for desirable/positive and undesirable/negative examples.
-By default, they are both 1. However, if you have more of one or the other, then you should upweight the less common type such that the ratio of (`desirable_weight` * number of positives) to (`undesirable_weight` * number of negatives) is in the range 1:1 to 4:3.
+By default, they are both 1. However, if you have more of one or the other, then you should upweight the less common type such that the ratio of (`desirable_weight` \\(\times\\) number of positives) to (`undesirable_weight` \\(\times\\) number of negatives) is in the range 1:1 to 4:3.
 
 ```py
 training_args = KTOConfig(
@@ -99,4 +99,4 @@ To scale how much the auxiliary loss contributes to the total loss, use the hype
 
 ## KTOConfig
 
-[[autodoc]] KTOConfig
+[[autodoc]] KTOConfig