-
Notifications
You must be signed in to change notification settings - Fork 523
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update configs #2107
update configs #2107
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -18,6 +18,8 @@ | |
# best to use 8B_full_single_device.yaml for those cases | ||
|
||
|
||
output_dir: /tmp/torchtune/dev_8B/full_experimental # /tmp may be deleted by your system. Change it to your preference. | ||
|
||
# Tokenizer | ||
tokenizer: | ||
_component_: torchtune.models.llama3.llama3_tokenizer | ||
|
@@ -42,7 +44,7 @@ checkpointer: | |
consolidated.00.pth | ||
] | ||
recipe_checkpoint: null | ||
output_dir: /tmp/Meta-Llama-3-8B/ | ||
output_dir: ${output_dir} | ||
model_type: LLAMA3 | ||
resume_from_checkpoint: False | ||
|
||
|
@@ -57,8 +59,8 @@ optimizer: | |
loss: | ||
_component_: torchtune.modules.loss.CEWithChunkedOutputLoss | ||
max_steps_per_epoch: null | ||
gradient_accumulation_steps: 1 # Use to increase virtual batch size | ||
compile: False # pytorch compile, set to true for better perf/memory | ||
gradient_accumulation_steps: 1 # Use to increase effective batch size | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure what makes this experimental, but did you double check that gradient_acc and compile work with this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. no, but this change was just a comment change |
||
compile: False # torch.compile the model + loss, True increases speed + decreases memory | ||
|
||
# Training env | ||
device: cuda | ||
|
@@ -77,11 +79,11 @@ dtype: bf16 | |
# Logging | ||
metric_logger: | ||
_component_: torchtune.training.metric_logging.DiskLogger | ||
log_dir: ${output_dir} | ||
output_dir: /tmp/alpaca-llama3-finetune | ||
log_dir: ${output_dir}/logs | ||
log_every_n_steps: null | ||
log_peak_memory_stats: True | ||
|
||
|
||
# Profiler (disabled) | ||
profiler: | ||
_component_: torchtune.training.setup_torch_profiler | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this recipe? Shouldn't it be under a dev/model structure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the dev recipe for selective activation checkpointing. I think we should decide what we wanna do with this feature (either integrate it by default or scrap it, cause I don't like that we currently expose two different AC APIs). I think it still provides parity with vanilla AC so we could just turn it on everywhere given requests like #2101
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created #2114 to continue the discussion there