update configs #2107

felipemello1 · 2024-12-03T19:08:05Z

Context

What is the purpose of this PR? Is it to

add a new feature
fix a bug
update tests and/or documentation
other (please add here)

Changelog

rename KD configs: knowledge_distilation.yaml --> 8B_to_1B_KD.yaml
Put output_dir at the top of the configs
Replace output_dir with /tmp/torchtune/{model_name}/{recipe_name} # /tmp may be deleted by your system. Change it to your preference.
eg
/tmp/torchtune/qwen2_5_0.5B/full
/tmp/torchtune/phi3_mini/full_low_memory
/tmp/torchtune/llama2_7B/lora_dpo_single_device
/tmp/torchtune/qwen2_1.5_to_0.5B/KD_distributed
replace metric_logger.log_dir with ${output_dir}/logs
replace checkpointer.output_dir with ${output_dir}
replace compile comment with "# torch.compile the model + loss, True increases speed + decreases memory"
other changes are just byproducts of the recipe in new configs that were not following the new standards

Test plan

eyes

Script:

import os


def modify_yaml_file(file_path):
    updated = {
        "updated_compile": False,
        "updated_packed": False,
        "added_compile": False,
        "added_activation_offloading": False,
        "added_packed": False,
        "added_profiler": False,
        "updated_gradient_accumulation_steps": False,
        "updated_checkpointing_comment": False,
        "updated_gradient_comment": False,
        "updated_compile_comment": False,
        "updated_packed_comment": False,
    }

    with open(file_path, "r") as file:
        lines = file.readlines()
    # Step 2: Remove duplicate 'compile' entries
    compile_indices = [
        i for i, line in enumerate(lines) if line.strip().startswith("compile:")
    ]
    if len(compile_indices) > 1:
        for index in sorted(compile_indices, reverse=True):
            del lines[index]
        updated["updated_compile"] = True
    # Step 3: Move 'packed' after '_component_' and align indentation
    for i, line in enumerate(lines):
        if (
            line.strip().startswith("packed:")
            and i + 1 < len(lines)
            and "_component_" in lines[i + 1]
        ):
            packed_line = lines.pop(i)
            lines.insert(i + 1, packed_line)  # Insert after the _component_ line
            updated["updated_packed"] = True
            break
    # Step 4: Add 'compile' if missing
    if not any(line.strip().startswith("compile:") for line in lines):
        for i, line in enumerate(lines):
            if line.strip().startswith("max_steps_per_epoch:"):
                indentation = len(line) - len(line.lstrip())
                new_line = (
                    " " * indentation
                    + "compile: False # pytorch compile, set to true for better perf/memory\n"
                )
                lines.insert(i + 1, new_line)
                updated["added_compile"] = True
                break
    # Step 5: Add 'enable_activation_offloading' if missing
    if (
        not any(
            line.strip().startswith("enable_activation_offloading:") for line in lines
        )
        and "vision" not in file_path
        and "ppo" not in file_path
        and "dpo" not in file_path
        and "distillation" not in file_path
        and "qat" not in file_path
    ):
        for i, line in enumerate(lines):
            if line.strip().startswith("enable_activation_checkpointing:"):
                indentation = len(line) - len(line.lstrip())
                new_line = (
                    " " * indentation
                    + "enable_activation_offloading: False  # True reduces memory\n"
                )
                lines.insert(i + 1, new_line)
                updated["added_activation_offloading"] = True
                break
    # Step 6: Add 'packed' if missing
    if "dpo" not in file_path and "ppo" not in file_path:
        if (
            not any(line.strip().startswith("packed:") for line in lines)
            and "vision" not in file_path
        ):
            for i, line in enumerate(lines):
                if "_component_" in line and "dataset" in lines[i - 1]:
                    indentation = len(line) - len(line.lstrip())
                    new_line = (
                        " " * indentation + "packed: False # True increases speed\n"
                    )
                    lines.insert(i + 1, new_line)
                    updated["added_packed"] = True
                    break

    # Step 7: Replace/Add 'profiler' section if missing
    if "ppo" not in file_path and "dpo" not in file_path:
        profiler_section = """
# Profiler (disabled)
profiler:
  _component_: torchtune.training.setup_torch_profiler
  enabled: False

  #Output directory of trace artifacts
  output_dir: ${output_dir}/profiling_outputs

  #`torch.profiler.ProfilerActivity` types to trace
  cpu: True
  cuda: True

  #trace options passed to `torch.profiler.profile`
  profile_memory: False
  with_stack: False
  record_shapes: True
  with_flops: False

  # `torch.profiler.schedule` options:
  # wait_steps -> wait, warmup_steps -> warmup, active_steps -> active, num_cycles -> repeat
  wait_steps: 5
  warmup_steps: 3
  active_steps: 2
  num_cycles: 1
"""

        # Correct the 'profiler' section if it has incorrect indentation
        start_index = None
        end_index = None
        for i, line in enumerate(lines):
            if line.strip().startswith("# Profiler (disabled)"):
                start_index = i
            if line.strip().startswith("num_cycles: 1"):
                end_index = i
                break

        if start_index is not None and end_index is not None:
            # Replaces profiler

            # Remove the old section
            del lines[start_index : end_index + 1]
            # Insert the new section
            lines.insert(start_index, profiler_section)
            updated["added_profiler"] = True

        elif not any(line.strip().startswith("profiler:") for line in lines):
            lines.append(profiler_section)
            updated["added_profiler"] = True

    # Step 8: Update 'gradient_accumulation_steps' if greater than 1
    for i, line in enumerate(lines):
        if line.strip().startswith("gradient_accumulation_steps:"):
            parts = line.split(":")
            value = int(parts[1].split("#")[0].strip())
            if len(parts) > 1 and value > 1:
                lines[i] = parts[0] + ": 8\n"
                updated["updated_gradient_accumulation_steps"] = True
            break

    # Step 9: Add or replace comment for 'enable_activation_checkpointing'
    for i, line in enumerate(lines):
        if line.strip().startswith("enable_activation_checkpointing:"):
            parts = line.split("#")
            lines[i] = parts[0].strip() + "  # True reduces memory\n"
            updated["updated_checkpointing_comment"] = True
            break
    # Step 9.5: Add or replace comment for 'enable_activation_offloading'
    for i, line in enumerate(lines):
        if line.strip().startswith("enable_activation_offloading:"):
            parts = line.split("#")
            lines[i] = parts[0].strip() + "  # True reduces memory\n"
            updated["updated_checkpointing_comment"] = True
            break
    # Step 10: Add or replace comment for 'gradient_accumulation_steps'
    for i, line in enumerate(lines):
        if line.strip().startswith("gradient_accumulation_steps:"):
            parts = line.split("#")
            lines[i] = parts[0].rstrip() + "  # Use to increase virtual batch size\n"
            updated["updated_gradient_comment"] = True
            break
    # Step 11: Add or replace comment for 'compile'
    for i, line in enumerate(lines):
        if line.strip().startswith("compile:"):
            parts = line.split("#")
            lines[i] = (
                parts[0].rstrip()
                + "  # torch.compile the model + loss, True increases speed + decreases memory\n"
            )
            updated["updated_compile_comment"] = True
            break
    # Step 12: Add or replace comment for 'packed'
    for i, line in enumerate(lines):
        if line.strip().startswith("packed:"):
            parts = line.split("#")
            lines[i] = parts[0].rstrip() + "  # True increases speed\n"
            updated["updated_packed_comment"] = True
            break

    # for files ending with "full.yaml" or "full_single_device.yaml"
    if (
        file_path.endswith("full.yaml")
        or file_path.endswith("full_single_device.yaml")
        and "qat" not in file_path
        and "ppo" not in file_path
        and "dpo" not in file_path
    ):
        # Step 13: Add 'optimizer_in_bwd: False' if missing
        if not any(line.strip().startswith("optimizer_in_bwd:") for line in lines):
            for i, line in enumerate(lines):
                if line.strip().startswith("compile:"):
                    indentation = len(line) - len(line.lstrip())
                    new_line = " " * indentation + "optimizer_in_bwd: False\n"
                    lines.insert(i + 1, new_line)
                    updated["added_optimizer_in_bwd"] = True
                    break

    # Step 14: Add/replace comment for 'optimizer_in_bwd'
    for i, line in enumerate(lines):
        if line.strip().startswith("optimizer_in_bwd:"):
            parts = line.split("#")
            lines[i] = (
                parts[0].rstrip()
                + "  # True saves memory. Requires gradient_accumulation_steps=1\n"
            )
            updated["updated_optimizer_in_bwd_comment"] = True
            break

    # Step 14.5: Add/replace comment for 'custom_sharded_layers'
    for i, line in enumerate(lines):
        if line.strip().startswith("custom_sharded_layers:"):
            parts = line.split("#")
            lines[i] = (
                parts[0].rstrip()
                + "  # Layers to shard separately (useful for large vocab size models). Lower Memory, but lower speed.\n"
            )
            updated["updated_custom_sharded_layers_comment"] = True
            break

    # for files with lora in the name
    if "lora" in file_path or "dora" in file_path:
        for i, line in enumerate(lines):
            # Step 15: make 'lora_attn_modules: ['q_proj', 'v_proj', 'output_proj']'
            if line.strip().startswith("lora_attn_modules:"):
                lines[i] = "  lora_attn_modules: ['q_proj', 'v_proj', 'output_proj']\n"
            # Step 16: make 'apply_lora_to_mlp: True'
            elif line.strip().startswith("apply_lora_to_mlp:"):
                lines[i] = "  apply_lora_to_mlp: True\n"
            # Step 17: add comment to 'lora_rank'
            elif line.strip().startswith("lora_rank:"):
                parts = line.split("#")
                lines[i] = (
                    parts[0].rstrip() + "  # higher increases accuracy and memory\n"
                )
            # Step 18: add comment to 'lora_alpha'
            elif line.strip().startswith("lora_alpha:"):
                parts = line.split("#")
                lines[i] = parts[0].rstrip() + "  # usually alpha=2*rank\n"

    # Step 19: Add 'output_dir' before 'model:' if missing or replace existing one
    new_output_dir_index = None

    # Find index of existing 'output_dir:'
    for i, line in enumerate(lines):
        if line.startswith("output_dir:"):
            current_output_dir = line

            # Remove existing 'output_dir:'
            del lines[i]

            # if next line is empty, remove it too
            if i < len(lines) and not lines[i].strip():
                del lines[i]

            break

    # Find index of first non-empty line that is not a comment
    for i, line in enumerate(lines):
        if line.strip() and not line.startswith("#"):

            # go back until line is empty, to find a good spot for output_dir
            while i > 0 and lines[i - 1].strip():
                i -= 1

            new_output_dir_index = i

            break

    # Add new 'output_dir:' before 'model:'
    if new_output_dir_index is not None and current_output_dir is not None:
        lines.insert(new_output_dir_index, current_output_dir)
        lines.insert(new_output_dir_index + 1, "\n")

    # Step 20: rename output_dir to /tmp/torchtune/model/recipe.
    # file_path = path_to_file/model_folder/{model_size}_{recipe_name}.yaml
    model_folder = file_path.split("/")[-2]
    config_name = file_path.split("/")[-1]

    if "_KD_" not in config_name:
        model_size = config_name.split("_")[0]  # e.g. 8B_lora_full.yaml
    else:
        model_size = config_name.split("_KD_")[0]  # e.g. 8B_to_1B_KD_lora_full.yaml
    model_size = model_size.replace(".", "_")

    recipe_name = config_name[len(model_size) + 1 :].replace(".yaml", "")
    new_output_dir = f"/tmp/torchtune/{model_folder}_{model_size}/{recipe_name}"

    for i, line in enumerate(lines):
        if line.startswith("output_dir:"):
            lines[i] = (
                f"output_dir: {new_output_dir} # /tmp may be deleted by your system. Change it to your preference.\n"
            )
            break

    # Step 21: update "metric_logger.log_dir" to ${output_dir}/logs
    for i, line in enumerate(lines):
        if line.strip().startswith("metric_logger:"):
            for j in range(i + 1, len(lines)):
                if lines[j].strip().startswith("log_dir:"):
                    spacing = len(lines[j]) - len(lines[j].lstrip())
                    lines[j] = (" " * spacing) + "log_dir: ${output_dir}/logs\n"
                    break
                elif not line.strip():  # end of section
                    break
            break

    # Step 22: update "checkpointer.output_dir" to ${output_dir}
    for i, line in enumerate(lines):
        if line.strip().startswith("checkpointer:"):
            for j in range(i + 1, len(lines)):
                if lines[j].strip().startswith("output_dir:"):
                    spacing = len(lines[j]) - len(lines[j].lstrip())
                    lines[j] = (" " * spacing) + "output_dir: ${output_dir}\n"
                    break
                elif not line.strip():  # end of section
                    break
            break

    with open(file_path, "w") as file:
        file.writelines(lines)
    return updated


def search_yaml_files(directory):
    updated_files = []
    not_updated_files = []
    for root, dirs, files in os.walk(directory):
        for file in files:
            if "_" not in file or "generation" in file or "evaluation" in file:
                print(f"Skipping {file}")
                continue
            file_path = os.path.join(root, file)
            updates = modify_yaml_file(file_path)
            if any(updates.values()):
                updated_files.append({file_path: updates})
            else:
                not_updated_files.append(file_path)
    print("Updated files and changes:")
    for update in updated_files:
        print(update)
    print("\nFiles not updated:")
    for file in not_updated_files:
        print(file)


directory = "recipes/configs"
search_yaml_files(directory)

pytorch-bot · 2024-12-03T19:08:09Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2107

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit a1d0eda with merge base 32e265d ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pbontrager

Some small comments on names but otherewise looks good

pbontrager · 2024-12-05T17:10:40Z

recipes/configs/dev/8B_full_experimental.yaml

@@ -18,6 +18,8 @@
 # best to use 8B_full_single_device.yaml for those cases


+output_dir: /tmp/torchtune/dev_8B/full_experimental # /tmp may be deleted by your system. Change it to your preference.


What is this recipe? Shouldn't it be under a dev/model structure?

This is the dev recipe for selective activation checkpointing. I think we should decide what we wanna do with this feature (either integrate it by default or scrap it, cause I don't like that we currently expose two different AC APIs). I think it still provides parity with vanilla AC so we could just turn it on everywhere given requests like #2101

Created #2114 to continue the discussion there

pbontrager · 2024-12-05T17:13:00Z

recipes/configs/dev/8B_full_experimental.yaml

@@ -57,8 +59,8 @@ optimizer:
 loss:
  _component_: torchtune.modules.loss.CEWithChunkedOutputLoss
 max_steps_per_epoch: null
-gradient_accumulation_steps: 1  # Use to increase virtual batch size
-compile: False  # pytorch compile, set to true for better perf/memory
+gradient_accumulation_steps: 1  # Use to increase effective batch size


Not sure what makes this experimental, but did you double check that gradient_acc and compile work with this?

no, but this change was just a comment change

pbontrager · 2024-12-05T17:19:49Z

recipes/configs/llama3_2/8B_to_1B_KD_single_device.yaml

@@ -10,11 +10,13 @@
 #   tune run lora_finetune_single_device --config llama3_1/8B_lora_single_device
 #
 # To launch on a single device, run the following command from root:
-#   tune run knowledge_distillation_single_device --config llama3_2/knowledge_distillation_single_device
+#   tune run knowledge_distillation_single_device --config llama3_2/8B_to_1B_KD_single_device


Can you add lora to the name?

* Llama 3.3 70B (pytorch#2124) * Llama 3.3 readme updates (pytorch#2125) * update configs (pytorch#2107) Co-authored-by: Felipe Mello <[email protected]> * Reduce logging output for distributed KD (pytorch#2120) * Support Early Exit Loss and/or Layer Dropout (pytorch#1076) Co-authored-by: ebsmothers <[email protected]> * Update checkpointing directory (pytorch#2074) Co-authored-by: Felipe Mello <[email protected]> Co-authored-by: vancoyendall <[email protected]> * pass correct arg (pytorch#2127) Co-authored-by: Felipe Mello <[email protected]> * update configs (pytorch#2128) Co-authored-by: Felipe Mello <[email protected]> * fix qat_lora_test (pytorch#2131) Co-authored-by: Felipe Mello <[email protected]> --------- Co-authored-by: Philip Bontrager <[email protected]> Co-authored-by: ebsmothers <[email protected]> Co-authored-by: Felipe Mello <[email protected]> Co-authored-by: Felipe Mello <[email protected]> Co-authored-by: Joe Cummings <[email protected]> Co-authored-by: Mostafa Elhoushi <[email protected]> Co-authored-by: vancoyendall <[email protected]>

* Llama 3.3 70B (pytorch#2124) * Llama 3.3 readme updates (pytorch#2125) * update configs (pytorch#2107) Co-authored-by: Felipe Mello <[email protected]> * Reduce logging output for distributed KD (pytorch#2120) * Support Early Exit Loss and/or Layer Dropout (pytorch#1076) Co-authored-by: ebsmothers <[email protected]> * Update checkpointing directory (pytorch#2074) Co-authored-by: Felipe Mello <[email protected]> Co-authored-by: vancoyendall <[email protected]> * pass correct arg (pytorch#2127) Co-authored-by: Felipe Mello <[email protected]> * update configs (pytorch#2128) Co-authored-by: Felipe Mello <[email protected]> * fix qat_lora_test (pytorch#2131) Co-authored-by: Felipe Mello <[email protected]> * guard ckpt imports (pytorch#2133) Co-authored-by: Felipe Mello <[email protected]> * [bug fix] add parents=True (pytorch#2136) Co-authored-by: Felipe Mello <[email protected]> * [bug fix] re-add model (pytorch#2135) Co-authored-by: Felipe Mello <[email protected]> * Update save sizes into GiB (pytorch#2143) * [bug fix] remove config download when source is kaggle (pytorch#2144) Co-authored-by: Felipe Mello <[email protected]> * [fix] remove "with_suffix" (pytorch#2146) Co-authored-by: Felipe Mello <[email protected]> * DoRA fixes (pytorch#2139) Co-authored-by: Mircea Mironenco <[email protected]> * [Fix] Llama 3.2 Vision decoder_trainable flag fixed (pytorch#2150) * Small readme, config updates (pytorch#2157) * Using `FormattedCheckpointFiles` in configs (pytorch#2147) * Move ``get_world_size_and_rank`` to utils (pytorch#2155) * Faster intermediate checkpoints with DCP async save in TorchTune (pytorch#2006) Co-authored-by: Saurabh Mishra <[email protected]> * torchdata integration - multi-dataset and streaming support (pytorch#1929) * Allow higher version of lm-eval (pytorch#2165) * Using `FormattedCheckpointFiles` in configs... round 2 (pytorch#2167) * [EZ] Fix set_torch_num_threads in multi-node. (pytorch#2164) --------- Co-authored-by: Philip Bontrager <[email protected]> Co-authored-by: ebsmothers <[email protected]> Co-authored-by: Felipe Mello <[email protected]> Co-authored-by: Felipe Mello <[email protected]> Co-authored-by: Joe Cummings <[email protected]> Co-authored-by: Mostafa Elhoushi <[email protected]> Co-authored-by: vancoyendall <[email protected]> Co-authored-by: Mircea Mironenco <[email protected]> Co-authored-by: salman <[email protected]> Co-authored-by: Saurabh Mishra <[email protected]> Co-authored-by: Saurabh Mishra <[email protected]> Co-authored-by: Andrew Ho <[email protected]> Co-authored-by: Eugen Hotaj <[email protected]>

Co-authored-by: Felipe Mello <[email protected]>

Felipe Mello added 2 commits December 3, 2024 10:28

update kd recipes names

082b750

update configs

3fd7790

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 3, 2024

Felipe Mello added 2 commits December 3, 2024 11:32

rename "virtual" to "effective" bsz

839e342

update test names

40419fd

ebsmothers mentioned this pull request Dec 4, 2024

Update checkpointing directory -> using vLLM and from_pretrained #2074

Merged

4 tasks

joecummings approved these changes Dec 5, 2024

View reviewed changes

pbontrager approved these changes Dec 5, 2024

View reviewed changes

SalmanMohammadi approved these changes Dec 5, 2024

View reviewed changes

update KD config names

a1d0eda

felipemello1 merged commit 2b1ee6d into pytorch:main Dec 6, 2024
17 checks passed

rahul-sarvam pushed a commit to sarvamai/torchtune that referenced this pull request Dec 23, 2024

update configs (pytorch#2107)

0caae59

Co-authored-by: Felipe Mello <[email protected]>

rahul-sarvam pushed a commit to sarvamai/torchtune that referenced this pull request Dec 23, 2024

update configs (pytorch#2107)

743b01b

Co-authored-by: Felipe Mello <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update configs #2107

update configs #2107

felipemello1 commented Dec 3, 2024 •

edited

Loading

pytorch-bot bot commented Dec 3, 2024 •

edited

Loading

pbontrager left a comment

pbontrager Dec 5, 2024

ebsmothers Dec 5, 2024

ebsmothers Dec 5, 2024

pbontrager Dec 5, 2024

felipemello1 Dec 6, 2024

pbontrager Dec 5, 2024

felipemello1 Dec 6, 2024

		@@ -18,6 +18,8 @@
		# best to use 8B_full_single_device.yaml for those cases


		output_dir: /tmp/torchtune/dev_8B/full_experimental # /tmp may be deleted by your system. Change it to your preference.

update configs #2107

update configs #2107

Conversation

felipemello1 commented Dec 3, 2024 • edited Loading

Context

Changelog

Test plan

pytorch-bot bot commented Dec 3, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2107

✅ No Failures

pbontrager left a comment

Choose a reason for hiding this comment

pbontrager Dec 5, 2024

Choose a reason for hiding this comment

ebsmothers Dec 5, 2024

Choose a reason for hiding this comment

ebsmothers Dec 5, 2024

Choose a reason for hiding this comment

pbontrager Dec 5, 2024

Choose a reason for hiding this comment

felipemello1 Dec 6, 2024

Choose a reason for hiding this comment

pbontrager Dec 5, 2024

Choose a reason for hiding this comment

felipemello1 Dec 6, 2024

Choose a reason for hiding this comment

felipemello1 commented Dec 3, 2024 •

edited

Loading

pytorch-bot bot commented Dec 3, 2024 •

edited

Loading