Added error check to RLOO, PPOv2, OnlineDPO that ref_policy
and policy
have different identities
#4236
Job | Run time |
---|---|
3m 14s | |
3m 14s |
ref_policy
and policy
have different identities
#4236
Job | Run time |
---|---|
3m 14s | |
3m 14s |