From a3fa57068cab10dbf42ef705444733699a66e976 Mon Sep 17 00:00:00 2001
From: Mattan Yeroushalmi <mattany@gmail.com>
Date: Sat, 7 Sep 2024 02:00:50 +0300
Subject: [PATCH 1/3] correct formatting of star sign in kto_trainer.mdx

The "*" symbol in markdown doesn't show. I changed it to $\times$ so the mathematical formula is clearer
---
 docs/source/kto_trainer.mdx | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/source/kto_trainer.mdx b/docs/source/kto_trainer.mdx
index a40d2f9e0b..2288dc37b7 100644
--- a/docs/source/kto_trainer.mdx
+++ b/docs/source/kto_trainer.mdx
@@ -62,7 +62,7 @@ For a detailed example have a look at the `examples/scripts/kto.py` script. At a
 The `beta` refers to the hyperparameter of the implicit reward, and the dataset contains the 3 entries listed above. Note that the `model` and `ref_model` need to have the same architecture (ie decoder only or encoder-decoder).
 
 The `desirable_weight` and `undesirable_weight` refer to the weights placed on the losses for desirable/positive and undesirable/negative examples.
-By default, they are both 1. However, if you have more of one or the other, then you should upweight the less common type such that the ratio of (`desirable_weight` * number of positives) to (`undesirable_weight` * number of negatives) is in the range 1:1 to 4:3.
+By default, they are both 1. However, if you have more of one or the other, then you should upweight the less common type such that the ratio of (`desirable_weight` $\times$ number of positives) to (`undesirable_weight` $\times$ number of negatives) is in the range 1:1 to 4:3.
 
 ```py
 training_args = KTOConfig(
@@ -99,4 +99,4 @@ To scale how much the auxiliary loss contributes to the total loss, use the hype
 
 ## KTOConfig
 
-[[autodoc]] KTOConfig
\ No newline at end of file
+[[autodoc]] KTOConfig

From 6c183b5204c054658a766f96d980beb1a3d04a53 Mon Sep 17 00:00:00 2001
From: Mattan Yeroushalmi <mattany@gmail.com>
Date: Sat, 7 Sep 2024 13:14:06 +0300
Subject: [PATCH 2/3] fix markdown

---
 docs/source/kto_trainer.mdx | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/source/kto_trainer.mdx b/docs/source/kto_trainer.mdx
index 2288dc37b7..5421744f0f 100644
--- a/docs/source/kto_trainer.mdx
+++ b/docs/source/kto_trainer.mdx
@@ -62,7 +62,7 @@ For a detailed example have a look at the `examples/scripts/kto.py` script. At a
 The `beta` refers to the hyperparameter of the implicit reward, and the dataset contains the 3 entries listed above. Note that the `model` and `ref_model` need to have the same architecture (ie decoder only or encoder-decoder).
 
 The `desirable_weight` and `undesirable_weight` refer to the weights placed on the losses for desirable/positive and undesirable/negative examples.
-By default, they are both 1. However, if you have more of one or the other, then you should upweight the less common type such that the ratio of (`desirable_weight` $\times$ number of positives) to (`undesirable_weight` $\times$ number of negatives) is in the range 1:1 to 4:3.
+By default, they are both 1. However, if you have more of one or the other, then you should upweight the less common type such that the ratio of (`desirable_weight` \\(\\times\\) number of positives) to (`undesirable_weight` \\(\\times\\) number of negatives) is in the range 1:1 to 4:3.
 
 ```py
 training_args = KTOConfig(

From 4e43778375487d526a3e3ac25a4dea5a90451c4e Mon Sep 17 00:00:00 2001
From: Mattan Yeroushalmi <mattany@gmail.com>
Date: Sat, 7 Sep 2024 18:33:07 +0300
Subject: [PATCH 3/3] one more try

---
 docs/source/kto_trainer.mdx | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/source/kto_trainer.mdx b/docs/source/kto_trainer.mdx
index 5421744f0f..3fa10e8c8a 100644
--- a/docs/source/kto_trainer.mdx
+++ b/docs/source/kto_trainer.mdx
@@ -62,7 +62,7 @@ For a detailed example have a look at the `examples/scripts/kto.py` script. At a
 The `beta` refers to the hyperparameter of the implicit reward, and the dataset contains the 3 entries listed above. Note that the `model` and `ref_model` need to have the same architecture (ie decoder only or encoder-decoder).
 
 The `desirable_weight` and `undesirable_weight` refer to the weights placed on the losses for desirable/positive and undesirable/negative examples.
-By default, they are both 1. However, if you have more of one or the other, then you should upweight the less common type such that the ratio of (`desirable_weight` \\(\\times\\) number of positives) to (`undesirable_weight` \\(\\times\\) number of negatives) is in the range 1:1 to 4:3.
+By default, they are both 1. However, if you have more of one or the other, then you should upweight the less common type such that the ratio of (`desirable_weight` \\(\times\\) number of positives) to (`undesirable_weight` \\(\times\\) number of negatives) is in the range 1:1 to 4:3.
 
 ```py
 training_args = KTOConfig(