Ensure additional metadata to trackers don't throw error in happy case. #290

dushyantbehl · 2024-08-07T09:45:47Z

Description of the change

Ensure additional metadata to trackers will not throw errors in happy case.

Related issue number

Fixes #289

How to verify the PR

Verified

(.fms-hf-tuning-venv) ➜  fms-hf-tuning git:(fix-bad-value-error) ✗ accelerate launch \
--main_process_port 1234 \
--config_file /home/schrodinger/workspace/ibm/fms-hf-tuning/fixtures/accelerate_fsdp_defaults.yaml \
--num_processes 2 \
-m tuning.sft_trainer \
--model_name_or_path "Maykeye/TinyLLama-v0" \
--training_data_path "/home/schrodinger/bin/twitter_complaints.json" \
--output_dir /home/schrodinger/bin/output/tiny-llama-ft-multigpu \
--num_train_epochs 5 \
--per_device_train_batch_size 4 \
--gradient_accumulation_steps 4 \
--per_device_eval_batch_size 4 \
--learning_rate 0.000001 \
--use_flash_attn no \
--response_template "\n### Label:" \
--dataset_text_field "output" \
--tracker aim \
--aim_repo /home/schrodinger/bin/aimrepo/ \
--experiment aim-test-distributed-main \
--torch_dtype float16 
/home/schrodinger/.fms-hf-tuning-venv/lib/python3.10/site-packages/accelerate/utils/launch.py:253: FutureWarning: `fsdp_backward_prefetch_policy` is deprecated and will be removed in version 0.27.0 of 🤗 Accelerate. Use `fsdp_backward_prefetch` instead
  warnings.warn(
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message.
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message.
max_seq_length 4096 exceeds tokenizer.model_max_length             2048, using tokenizer.model_max_length 2048
max_seq_length 4096 exceeds tokenizer.model_max_length             2048, using tokenizer.model_max_length 2048
/home/schrodinger/.fms-hf-tuning-venv/lib/python3.10/site-packages/transformers/training_args.py:2007: FutureWarning: `--push_to_hub_token` is deprecated and will be removed in version 5 of 🤗 Transformers. Use `--hub_token` instead.
  warnings.warn(
/home/schrodinger/.fms-hf-tuning-venv/lib/python3.10/site-packages/huggingface_hub/utils/_deprecation.py:100: FutureWarning: Deprecated argument(s) used in '__init__': dataset_text_field, max_seq_length. Will not be supported from version '1.0.0'.

Deprecated positional argument(s) used in SFTTrainer, please use the SFTConfig to set these arguments instead.
  warnings.warn(message, FutureWarning)
/home/schrodinger/.fms-hf-tuning-venv/lib/python3.10/site-packages/trl/trainer/sft_trainer.py:280: UserWarning: You passed a `max_seq_length` argument to the SFTTrainer, the value you passed will override the one in the `SFTConfig`.
  warnings.warn(
/home/schrodinger/.fms-hf-tuning-venv/lib/python3.10/site-packages/trl/trainer/sft_trainer.py:318: UserWarning: You passed a `dataset_text_field` argument to the SFTTrainer, the value you passed will override the one in the `SFTConfig`.
  warnings.warn(
/home/schrodinger/.fms-hf-tuning-venv/lib/python3.10/site-packages/transformers/training_args.py:2007: FutureWarning: `--push_to_hub_token` is deprecated and will be removed in version 5 of 🤗 Transformers. Use `--hub_token` instead.
  warnings.warn(
/home/schrodinger/.fms-hf-tuning-venv/lib/python3.10/site-packages/huggingface_hub/utils/_deprecation.py:100: FutureWarning: Deprecated argument(s) used in '__init__': dataset_text_field, max_seq_length. Will not be supported from version '1.0.0'.

Deprecated positional argument(s) used in SFTTrainer, please use the SFTConfig to set these arguments instead.
  warnings.warn(message, FutureWarning)
/home/schrodinger/.fms-hf-tuning-venv/lib/python3.10/site-packages/trl/trainer/sft_trainer.py:280: UserWarning: You passed a `max_seq_length` argument to the SFTTrainer, the value you passed will override the one in the `SFTConfig`.
  warnings.warn(
/home/schrodinger/.fms-hf-tuning-venv/lib/python3.10/site-packages/trl/trainer/sft_trainer.py:318: UserWarning: You passed a `dataset_text_field` argument to the SFTTrainer, the value you passed will override the one in the `SFTConfig`.
  warnings.warn(
/home/schrodinger/.fms-hf-tuning-venv/lib/python3.10/site-packages/trl/trainer/sft_trainer.py:408: UserWarning: You passed a tokenizer with `padding_side` not equal to `right` to the SFTTrainer. This might lead to some unexpected behaviour due to overflow issues when training a model in half-precision. You might consider adding `tokenizer.padding_side = 'right'` to your code.
  warnings.warn(
/home/schrodinger/.fms-hf-tuning-venv/lib/python3.10/site-packages/trl/trainer/sft_trainer.py:408: UserWarning: You passed a tokenizer with `padding_side` not equal to `right` to the SFTTrainer. This might lead to some unexpected behaviour due to overflow issues when training a model in half-precision. You might consider adding `tokenizer.padding_side = 'right'` to your code.
  warnings.warn(
{'loss': 22276357.5138, 'grad_norm': nan, 'learning_rate': 8e-07, 'epoch': 1.0}                                                                                                                                                               
 20%|███████████████████████████████████████▌                                                                                                                                                              | 325/1625 [01:54<07:28,  2.90it/s]/home/schrodinger/.fms-hf-tuning-venv/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
  warnings.warn(
/home/schrodinger/.fms-hf-tuning-venv/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
  warnings.warn(
/home/schrodinger/.fms-hf-tuning-venv/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
  warnings.warn(
/home/schrodinger/.fms-hf-tuning-venv/lib/python3.10/site-packages/torch/storage.py:414: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  return torch.load(io.BytesIO(b))
/home/schrodinger/.fms-hf-tuning-venv/lib/python3.10/site-packages/torch/storage.py:414: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  return torch.load(io.BytesIO(b))
{'loss': 0.0, 'grad_norm': nan, 'learning_rate': 6e-07, 'epoch': 2.0}                                                                                                                                                                         
 40%|███████████████████████████████████████████████████████████████████████████████▏                                                                                                                      | 650/1625 [03:48<06:13,  2.61it/s]/home/schrodinger/.fms-hf-tuning-venv/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
  warnings.warn(
/home/schrodinger/.fms-hf-tuning-venv/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
  warnings.warn(
/home/schrodinger/.fms-hf-tuning-venv/lib/python3.10/site-packages/torch/storage.py:414: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  return torch.load(io.BytesIO(b))
{'loss': 0.0, 'grad_norm': nan, 'learning_rate': 4e-07, 'epoch': 3.0}                                                                                                                                                                         
 60%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊                                                                               | 975/1625 [05:42<03:29,  3.10it/s]/home/schrodinger/.fms-hf-tuning-venv/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
  warnings.warn(
/home/schrodinger/.fms-hf-tuning-venv/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
  warnings.warn(
/home/schrodinger/.fms-hf-tuning-venv/lib/python3.10/site-packages/torch/storage.py:414: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  return torch.load(io.BytesIO(b))
{'loss': 0.0, 'grad_norm': nan, 'learning_rate': 2e-07, 'epoch': 4.0}                                                                                                                                                                         
 80%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌                                       | 1300/1625 [07:34<01:50,  2.94it/s]/home/schrodinger/.fms-hf-tuning-venv/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
  warnings.warn(
/home/schrodinger/.fms-hf-tuning-venv/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
  warnings.warn(
/home/schrodinger/.fms-hf-tuning-venv/lib/python3.10/site-packages/torch/storage.py:414: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  return torch.load(io.BytesIO(b))
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1625/1625 [09:28<00:00,  2.94it/s]/home/schrodinger/.fms-hf-tuning-venv/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
  warnings.warn(
/home/schrodinger/.fms-hf-tuning-venv/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
  warnings.warn(
/home/schrodinger/.fms-hf-tuning-venv/lib/python3.10/site-packages/torch/storage.py:414: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  return torch.load(io.BytesIO(b))
{'loss': 0.0, 'grad_norm': nan, 'learning_rate': 0.0, 'epoch': 5.0}                                                                                                                                                                           
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1625/1625 [09:28<00:00,  2.94it/s]/home/schrodinger/.fms-hf-tuning-venv/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
  warnings.warn(
/home/schrodinger/.fms-hf-tuning-venv/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:689: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
  warnings.warn(
/home/schrodinger/.fms-hf-tuning-venv/lib/python3.10/site-packages/torch/storage.py:414: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  return torch.load(io.BytesIO(b))
{'train_runtime': 568.8356, 'train_samples_per_second': 91.415, 'train_steps_per_second': 2.857, 'train_loss': 4455271.502769231, 'epoch': 5.0}                                                                                               
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1625/1625 [09:28<00:00,  2.86it/s]

Was the PR tested

I have added >=1 unit test(s) for every new method I have added.
[x ] I have ensured all unit tests pass

Signed-off-by: Dushyant Behl <[email protected]>

ashokponkumar · 2024-08-07T11:15:22Z

@kmehant PTAL

* Set default value of target_modules to be None in LoraConfig Signed-off-by: Will Johnson <[email protected]> * Removal of transformers logger and addition of python logger Signed-off-by: Abhishek <[email protected]> * FMT and lint check: Removal of transformers logger and addition of python logger Signed-off-by: Abhishek <[email protected]> * fix: remove lm_head for granite with llama arch models (#258) * initial code for deleting lm_head Signed-off-by: Anh-Uong <[email protected]> * fix logic for copying checkpoint Signed-off-by: Anh-Uong <[email protected]> * fix check that embed_tokens and lm_head weights are the same Signed-off-by: Anh-Uong <[email protected]> * fix warning assertion Signed-off-by: Anh-Uong <[email protected]> * fix lm_head check, remove test Signed-off-by: Anh-Uong <[email protected]> * small fixes from code review Signed-off-by: Anh-Uong <[email protected]> * fmt Signed-off-by: Anh-Uong <[email protected]> --------- Signed-off-by: Anh-Uong <[email protected]> Co-authored-by: Anh-Uong <[email protected]> Signed-off-by: Abhishek <[email protected]> * Add config_utils tests Signed-off-by: Angel Luu <[email protected]> * Fix fmt Signed-off-by: Angel Luu <[email protected]> * Separate tests out and use docstrings Signed-off-by: Angel Luu <[email protected]> * Update more field/value checks from HF defaults Signed-off-by: Angel Luu <[email protected]> * Fix: Addition of env var TRANSFORMERS_VERBOSITY check Signed-off-by: Abhishek <[email protected]> * FMT Fix: Addition of env var TRANSFORMERS_VERBOSITY check Signed-off-by: Abhishek <[email protected]> * Add test for tokenizer in lora config (should be ignored) Signed-off-by: Angel Luu <[email protected]> * Adding logging support to accelerate launch Signed-off-by: Abhishek <[email protected]> * FMT_FIX: Adding logging support to accelerate launch Signed-off-by: Abhishek <[email protected]> * bug: On save event added to callback (#256) * feat: On save event added to callback Signed-off-by: Padmanabha V Seshadri <[email protected]> * fix: Removed additional bracket Signed-off-by: Padmanabha V Seshadri <[email protected]> * fix: Removed additional bracket Signed-off-by: Padmanabha V Seshadri <[email protected]> * fix: Format issues resolved Signed-off-by: Padmanabha V Seshadri <[email protected]> * fix: rebase with upstream and add new line Signed-off-by: Mehant Kammakomati <[email protected]> --------- Signed-off-by: Padmanabha V Seshadri <[email protected]> Signed-off-by: Mehant Kammakomati <[email protected]> Co-authored-by: Mehant Kammakomati <[email protected]> * feat: All metric handling changes (#263) * feat: All metric handling changes Signed-off-by: Padmanabha V Seshadri <[email protected]> * fix: Format issues Signed-off-by: Padmanabha V Seshadri <[email protected]> --------- Signed-off-by: Padmanabha V Seshadri <[email protected]> * feat: Configuration to set logging level for trigger log (#241) * feat: Added the triggered login in the operation Signed-off-by: Padmanabha V Seshadri <[email protected]> * fix: Formatting issues Signed-off-by: Padmanabha V Seshadri <[email protected]> * fix: Added default config Signed-off-by: Padmanabha V Seshadri <[email protected]> * fix: Moved the variable to right scope Signed-off-by: Padmanabha V Seshadri <[email protected]> * fix: Checked added to validate config log level Signed-off-by: Padmanabha V Seshadri <[email protected]> * fix: Removed some unwanted log file Signed-off-by: Padmanabha V Seshadri <[email protected]> --------- Signed-off-by: Padmanabha V Seshadri <[email protected]> * limit peft deps until investigate (#274) Signed-off-by: Anh-Uong <[email protected]> * Data custom collator (#260) * refactor code to preprocess datasets Co-authored-by: Alex-Brooks <[email protected]> Signed-off-by: Sukriti-Sharma4 <[email protected]> * fix formatting Co-authored-by: Alex-Brooks <[email protected]> Signed-off-by: Sukriti-Sharma4 <[email protected]> * allow input/output in validate args Co-authored-by: Alex-Brooks <[email protected]> Signed-off-by: Sukriti-Sharma4 <[email protected]> * format input/output JSON and mask Co-authored-by: Alex-Brooks <[email protected]> Signed-off-by: Sukriti-Sharma4 <[email protected]> * function to return suitable collator Co-authored-by: Alex-Brooks <[email protected]> Signed-off-by: Sukriti-Sharma4 <[email protected]> * add tests for SFT Trainer input/output format Co-authored-by: Alex-Brooks <[email protected]> Signed-off-by: Sukriti-Sharma4 <[email protected]> * remove unused functions Co-authored-by: Alex-Brooks <[email protected]> Signed-off-by: Sukriti-Sharma4 <[email protected]> * add eos token to input/output format Signed-off-by: Sukriti-Sharma4 <[email protected]> * fix tests Signed-off-by: Sukriti-Sharma4 <[email protected]> * improve docstrings Signed-off-by: Sukriti-Sharma4 <[email protected]> * keeping JSON keys constant Signed-off-by: Sukriti-Sharma4 <[email protected]> * support for input/output format Signed-off-by: Sukriti-Sharma4 <[email protected]> * formatting fixes Signed-off-by: Sukriti-Sharma4 <[email protected]> * update rEADME formats Signed-off-by: Sukriti-Sharma4 <[email protected]> * formatting README Signed-off-by: Sukriti-Sharma4 <[email protected]> --------- Signed-off-by: Sukriti-Sharma4 <[email protected]> Co-authored-by: Alex-Brooks <[email protected]> * Revert "limit peft deps until investigate (#274)" (#275) This reverts commit f57ff63. Signed-off-by: Anh-Uong <[email protected]> * feat: per process state metric (#239) Signed-off-by: Harikrishnan Balagopal <[email protected]> * Modify test to pass with target_modules: None Signed-off-by: Will Johnson <[email protected]> * Logging changes and unit tests added Signed-off-by: Abhishek <[email protected]> * feat: Add a dockerfile argument to enable aimstack (#261) * Add a dockerfile argument at the end of final layer to enable aimstack. Currenlty guarded by a dockerfile argument. Signed-off-by: Dushyant Behl <[email protected]> * Set the default value of ENABLE_AIM to false Signed-off-by: Dushyant Behl <[email protected]> --------- Signed-off-by: Dushyant Behl <[email protected]> * Solved conflict with main Signed-off-by: Abhishek <[email protected]> * FMT:Fix Solved conflict with main Signed-off-by: Abhishek <[email protected]> * enabling tests for prompt tuning Signed-off-by: Abhishek <[email protected]> * feat: Support pretokenized (#272) * feat: support pretokenized datasets Signed-off-by: Mehant Kammakomati <[email protected]> * fix: rebase with upstream and review commits Signed-off-by: Mehant Kammakomati <[email protected]> * fix: rebase with upstream and review commits Signed-off-by: Mehant Kammakomati <[email protected]> * fix: rebase with upstream and review commits Signed-off-by: Mehant Kammakomati <[email protected]> * consolidate collator code Signed-off-by: Sukriti-Sharma4 <[email protected]> * add valuerrors for incorrect args Signed-off-by: Sukriti-Sharma4 <[email protected]> * feat: add unit tests for validate_data_args and format_dataset Signed-off-by: Mehant Kammakomati <[email protected]> * feat: add unit tests for validate_data_args and format_dataset Signed-off-by: Mehant Kammakomati <[email protected]> * feat: add unit tests for validate_data_args and format_dataset Signed-off-by: Mehant Kammakomati <[email protected]> * feat: add unit tests for validate_data_args and format_dataset Signed-off-by: Mehant Kammakomati <[email protected]> --------- Signed-off-by: Mehant Kammakomati <[email protected]> Signed-off-by: Sukriti-Sharma4 <[email protected]> Co-authored-by: Sukriti-Sharma4 <[email protected]> Co-authored-by: Alex Brooks <[email protected]> * Update packaging requirement from <24,>=23.2 to >=23.2,<25 (#212) Updates the requirements on [packaging](https://github.com/pypa/packaging) to permit the latest version. - [Release notes](https://github.com/pypa/packaging/releases) - [Changelog](https://github.com/pypa/packaging/blob/main/CHANGELOG.rst) - [Commits](pypa/packaging@23.2...24.1) --- updated-dependencies: - dependency-name: packaging dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Anh Uong <[email protected]> * enabling tests for prompt tuning (#278) Signed-off-by: Abhishek <[email protected]> Co-authored-by: Anh Uong <[email protected]> * fix: do not add special tokens for custom tokenizer (#279) Signed-off-by: Mehant Kammakomati <[email protected]> * PR changes for changing logger Signed-off-by: Abhishek <[email protected]> * fix: bug where the logger was not being used properly (#286) Signed-off-by: Hari <[email protected]> * Unit Tests changes Signed-off-by: Abhishek <[email protected]> * Add functionality to free disk space from Github Actions (#287) * Add functionality to free disk space from Github Actions Signed-off-by: Will Johnson <[email protected]> * Add functionality to free disk space from Github Actions, relocate from build-and-publish.yaml to image.yaml Signed-off-by: Will Johnson <[email protected]> * Move freeing space step to before building image Signed-off-by: Will Johnson <[email protected]> --------- Signed-off-by: Will Johnson <[email protected]> * commented os.environ[LOG_LEVEL] in accelerate.py for testing Signed-off-by: Abhishek <[email protected]> * PR changes Signed-off-by: Abhishek <[email protected]> * FIX:FMT Signed-off-by: Abhishek <[email protected]> * PR Changes Signed-off-by: Abhishek <[email protected]> * PR Changes Signed-off-by: Abhishek <[email protected]> * Add unit test to verify target_modules defaults correctly (#281) * Add unit test to verify target_modules defaults correctly Signed-off-by: Will Johnson <[email protected]> * Add sft_trainer.main test to ensure target modules properly default for LoRA when set to None from CLI Signed-off-by: Will Johnson <[email protected]> * fmt Signed-off-by: Will Johnson <[email protected]> * Use model_args instead of importing, fix nits Signed-off-by: Will Johnson <[email protected]> * Add test to ensure target_modules defaults to None in job config Signed-off-by: Will Johnson <[email protected]> * Add additional check, fix nits Signed-off-by: Will Johnson <[email protected]> --------- Signed-off-by: Will Johnson <[email protected]> * docs: Add documentation on experiment tracking. (#257) Signed-off-by: Dushyant Behl <[email protected]> * Ensure additional metadata to trackers don't throw error in happy case. (#290) Signed-off-by: Dushyant Behl <[email protected]> * PR Changes Signed-off-by: Abhishek <[email protected]> * fix multiple runid creation bug with accelerate. (#268) Signed-off-by: Dushyant Behl <[email protected]> * feat: logging control operation (#264) Signed-off-by: Padmanabha V Seshadri <[email protected]> * Metrics file epoch indexing from 0 Signed-off-by: Abhishek <[email protected]> * Revert last commit Signed-off-by: Abhishek <[email protected]> * fix run evaluation to get base model path (#273) Signed-off-by: Anh-Uong <[email protected]> * PR Changes Signed-off-by: Abhishek <[email protected]> * PR Changes Signed-off-by: Abhishek <[email protected]> * feat: Added additional events such as on_step_begin, on_optimizer_step, on_substep_end (#293) Signed-off-by: Padmanabha V Seshadri <[email protected]> * Always update setuptools to latest (#288) Signed-off-by: James Busche <[email protected]> Co-authored-by: Anh Uong <[email protected]> * Rename all fixtures with correct .jsonl extension (#295) Signed-off-by: Will Johnson <[email protected]> Co-authored-by: Anh Uong <[email protected]> * feat: add save_model_dir flag where final checkpoint saved (#291) * add save_model_dir flag for final checkpoint Signed-off-by: Anh-Uong <[email protected]> * remove output_dir logic, add save method Signed-off-by: Anh-Uong <[email protected]> * update accelerate_launch, remove save tokenizer Signed-off-by: Anh-Uong <[email protected]> * fix: put back creation of .complete file Signed-off-by: Anh-Uong <[email protected]> * fix failing tests and add new ones Signed-off-by: Anh-Uong <[email protected]> * tests: add sft_trainer test to train and save - small refactor of tests Signed-off-by: Anh-Uong <[email protected]> * add docs on saving checkpoints and fix help msg Signed-off-by: Anh-Uong <[email protected]> * update example and note best checkpoint Signed-off-by: Anh-Uong <[email protected]> * changes based on PR review Signed-off-by: Anh-Uong <[email protected]> * add logging to save, fix error out properly Signed-off-by: Anh-Uong <[email protected]> --------- Signed-off-by: Anh-Uong <[email protected]> --------- Signed-off-by: Will Johnson <[email protected]> Signed-off-by: Abhishek <[email protected]> Signed-off-by: Anh-Uong <[email protected]> Signed-off-by: Angel Luu <[email protected]> Signed-off-by: Padmanabha V Seshadri <[email protected]> Signed-off-by: Mehant Kammakomati <[email protected]> Signed-off-by: Sukriti-Sharma4 <[email protected]> Signed-off-by: Harikrishnan Balagopal <[email protected]> Signed-off-by: Dushyant Behl <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: Hari <[email protected]> Signed-off-by: James Busche <[email protected]> Co-authored-by: Abhishek <[email protected]> Co-authored-by: Sukriti Sharma <[email protected]> Co-authored-by: Anh-Uong <[email protected]> Co-authored-by: Abhishek Maurya <[email protected]> Co-authored-by: Angel Luu <[email protected]> Co-authored-by: Angel Luu <[email protected]> Co-authored-by: Padmanabha V Seshadri <[email protected]> Co-authored-by: Mehant Kammakomati <[email protected]> Co-authored-by: Alex-Brooks <[email protected]> Co-authored-by: Hari <[email protected]> Co-authored-by: Dushyant Behl <[email protected]> Co-authored-by: Sukriti-Sharma4 <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: James Busche <[email protected]>

dushyantbehl requested review from anhuong, Ssukriti and alex-jw-brooks as code owners August 7, 2024 09:45

Ensure additional metadata to trackers don't throw error in happy case.

1c5800a

Signed-off-by: Dushyant Behl <[email protected]>

dushyantbehl force-pushed the fix-bad-value-error branch from ecc9f37 to 1c5800a Compare August 7, 2024 09:58

ashokponkumar approved these changes Aug 7, 2024

View reviewed changes

kmehant approved these changes Aug 7, 2024

View reviewed changes

kmehant merged commit baeecf1 into foundation-model-stack:main Aug 7, 2024
7 checks passed

dushyantbehl deleted the fix-bad-value-error branch December 20, 2024 05:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure additional metadata to trackers don't throw error in happy case. #290

Ensure additional metadata to trackers don't throw error in happy case. #290

dushyantbehl commented Aug 7, 2024 •

edited

Loading

ashokponkumar commented Aug 7, 2024

Ensure additional metadata to trackers don't throw error in happy case. #290

Ensure additional metadata to trackers don't throw error in happy case. #290

Conversation

dushyantbehl commented Aug 7, 2024 • edited Loading

Description of the change

Related issue number

How to verify the PR

Was the PR tested

ashokponkumar commented Aug 7, 2024

dushyantbehl commented Aug 7, 2024 •

edited

Loading