Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add functionality to select heads to evaluate error tables on and perform dry runs. #836

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

IsaacParker30
Copy link
Contributor

Added 2 new arguments:

--eval_heads
This allows the user to specify which heads they want to evaluate / print an error table for at the end of training. If not set, will default evaluate all heads.
Usage: Specify all heads you want to evaluate on as a string with heads separated by commas. For example:
--eval_heads='default,pt_head,head3'
Example use scenario would be for replay fine-tuning if the user doesn't want to evaluate the large foundation model database on the pt_head.

--dry_run
Adding this argument will stop the run_train.py script just before the model training is about to begin (calls the tools.train() function. This allows the user to check if they've set their parameters correctly before beginning an expensive training run.

I have tested:

  • If dry run works.
  • If I can select heads to turn off.
  • If code evaluates on all heads if none specified.
  • If code stops if user specifies a head to evaluate on that doesn't exist.

Let me know if any changes needed.

@vue1999 vue1999 self-assigned this Feb 27, 2025
Copy link
Collaborator

@vue1999 vue1999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the pre-commit check suggestions, otherwise looks good.

vue1999
vue1999 previously approved these changes Feb 28, 2025
@vue1999 vue1999 removed their assignment Feb 28, 2025
@alinelena
Copy link
Contributor

@IsaacParker30 this addresses some of the things in #839 should have checked the PR before writing it. I wonder if makes sense to implement also first part in this.

@alinelena
Copy link
Contributor

one small point, if you do not need the head stats at the end you may not want to print it during the training either.

@alinelena
Copy link
Contributor

@IsaacParker30 this will need a small rebase

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants