[KTOTrainer
] add BCO (reward shift and underlying distribution matching)#1599
Merged
younesbelkada merged 11 commits intohuggingface:mainfrom seanexp:unpaired_bcoApr 30, 2024
+564-23