From a3fa57068cab10dbf42ef705444733699a66e976 Mon Sep 17 00:00:00 2001 From: Mattan Yeroushalmi Date: Sat, 7 Sep 2024 02:00:50 +0300 Subject: [PATCH 1/3] correct formatting of star sign in kto_trainer.mdx The "*" symbol in markdown doesn't show. I changed it to $\times$ so the mathematical formula is clearer --- docs/source/kto_trainer.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/source/kto_trainer.mdx b/docs/source/kto_trainer.mdx index a40d2f9e0b..2288dc37b7 100644 --- a/docs/source/kto_trainer.mdx +++ b/docs/source/kto_trainer.mdx @@ -62,7 +62,7 @@ For a detailed example have a look at the `examples/scripts/kto.py` script. At a The `beta` refers to the hyperparameter of the implicit reward, and the dataset contains the 3 entries listed above. Note that the `model` and `ref_model` need to have the same architecture (ie decoder only or encoder-decoder). The `desirable_weight` and `undesirable_weight` refer to the weights placed on the losses for desirable/positive and undesirable/negative examples. -By default, they are both 1. However, if you have more of one or the other, then you should upweight the less common type such that the ratio of (`desirable_weight` * number of positives) to (`undesirable_weight` * number of negatives) is in the range 1:1 to 4:3. +By default, they are both 1. However, if you have more of one or the other, then you should upweight the less common type such that the ratio of (`desirable_weight` $\times$ number of positives) to (`undesirable_weight` $\times$ number of negatives) is in the range 1:1 to 4:3. ```py training_args = KTOConfig( @@ -99,4 +99,4 @@ To scale how much the auxiliary loss contributes to the total loss, use the hype ## KTOConfig -[[autodoc]] KTOConfig \ No newline at end of file +[[autodoc]] KTOConfig From 6c183b5204c054658a766f96d980beb1a3d04a53 Mon Sep 17 00:00:00 2001 From: Mattan Yeroushalmi Date: Sat, 7 Sep 2024 13:14:06 +0300 Subject: [PATCH 2/3] fix markdown --- docs/source/kto_trainer.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/kto_trainer.mdx b/docs/source/kto_trainer.mdx index 2288dc37b7..5421744f0f 100644 --- a/docs/source/kto_trainer.mdx +++ b/docs/source/kto_trainer.mdx @@ -62,7 +62,7 @@ For a detailed example have a look at the `examples/scripts/kto.py` script. At a The `beta` refers to the hyperparameter of the implicit reward, and the dataset contains the 3 entries listed above. Note that the `model` and `ref_model` need to have the same architecture (ie decoder only or encoder-decoder). The `desirable_weight` and `undesirable_weight` refer to the weights placed on the losses for desirable/positive and undesirable/negative examples. -By default, they are both 1. However, if you have more of one or the other, then you should upweight the less common type such that the ratio of (`desirable_weight` $\times$ number of positives) to (`undesirable_weight` $\times$ number of negatives) is in the range 1:1 to 4:3. +By default, they are both 1. However, if you have more of one or the other, then you should upweight the less common type such that the ratio of (`desirable_weight` \\(\\times\\) number of positives) to (`undesirable_weight` \\(\\times\\) number of negatives) is in the range 1:1 to 4:3. ```py training_args = KTOConfig( From 4e43778375487d526a3e3ac25a4dea5a90451c4e Mon Sep 17 00:00:00 2001 From: Mattan Yeroushalmi Date: Sat, 7 Sep 2024 18:33:07 +0300 Subject: [PATCH 3/3] one more try --- docs/source/kto_trainer.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/kto_trainer.mdx b/docs/source/kto_trainer.mdx index 5421744f0f..3fa10e8c8a 100644 --- a/docs/source/kto_trainer.mdx +++ b/docs/source/kto_trainer.mdx @@ -62,7 +62,7 @@ For a detailed example have a look at the `examples/scripts/kto.py` script. At a The `beta` refers to the hyperparameter of the implicit reward, and the dataset contains the 3 entries listed above. Note that the `model` and `ref_model` need to have the same architecture (ie decoder only or encoder-decoder). The `desirable_weight` and `undesirable_weight` refer to the weights placed on the losses for desirable/positive and undesirable/negative examples. -By default, they are both 1. However, if you have more of one or the other, then you should upweight the less common type such that the ratio of (`desirable_weight` \\(\\times\\) number of positives) to (`undesirable_weight` \\(\\times\\) number of negatives) is in the range 1:1 to 4:3. +By default, they are both 1. However, if you have more of one or the other, then you should upweight the less common type such that the ratio of (`desirable_weight` \\(\times\\) number of positives) to (`undesirable_weight` \\(\times\\) number of negatives) is in the range 1:1 to 4:3. ```py training_args = KTOConfig(