-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add create_reference_model #61
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a bunch for working on this 🔥 Can't wait to try it on the trainer! Left few minor comments :D
""" | ||
|
||
parameter_names = [n for n, _ in model.named_parameters()] | ||
ref_model = deepcopy(model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think this can blow up the memory if manipulating very large models?
I was thinking of leveraging accelerate.init_empty_weights
context manager and initialize an empty model and populating it step by step. This can be boilerplate a bit so maybe let's leave it in a follow up PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good - let's see how it works and maybe adapt it to init_empty_weights
in a follow up :)
Co-authored-by: Younes Belkada <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice feature @lvwerra 🚀 ! I left some nits and a question about how the indexing of share_layers
is counted :)
trl/models/modeling_base.py
Outdated
ref_model = deepcopy(model) | ||
|
||
# if no layers are shared, return copy of model | ||
if share_layers is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to extend this logic to handle the case when share_layers=0
or does the indexing start with 0
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed indexing starts at 0 so this should be fine
The documentation is not available anymore as the PR was closed or merged. |
This PR adds the
create_reference_model
function. It can be used to create a static reference model from an existing model:The reference model can also share layers with the original model:
In that case the first three layers are frozen for both models and the remaining layers can be updated in the active model.
The layers are identified via string matching of their names. This works for GPT2/BLOOM/OPT/GPT-neo. If a custom pattern is necessary one could use the
pattern
keyword.