Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update configs #2107

Merged
merged 5 commits into from
Dec 6, 2024
Merged

update configs #2107

merged 5 commits into from
Dec 6, 2024

Conversation

felipemello1
Copy link
Contributor

@felipemello1 felipemello1 commented Dec 3, 2024

Context

What is the purpose of this PR? Is it to

  • add a new feature
  • fix a bug
  • update tests and/or documentation
  • other (please add here)

Changelog

  1. rename KD configs: knowledge_distilation.yaml --> 8B_to_1B_KD.yaml
  2. Put output_dir at the top of the configs
  3. Replace output_dir with /tmp/torchtune/{model_name}/{recipe_name} # /tmp may be deleted by your system. Change it to your preference.
    eg
    /tmp/torchtune/qwen2_5_0.5B/full
    /tmp/torchtune/phi3_mini/full_low_memory
    /tmp/torchtune/llama2_7B/lora_dpo_single_device
    /tmp/torchtune/qwen2_1.5_to_0.5B/KD_distributed
  4. replace metric_logger.log_dir with ${output_dir}/logs
  5. replace checkpointer.output_dir with ${output_dir}
  6. replace compile comment with "# torch.compile the model + loss, True increases speed + decreases memory"
  7. other changes are just byproducts of the recipe in new configs that were not following the new standards

Test plan

eyes

Script:

import os


def modify_yaml_file(file_path):
    updated = {
        "updated_compile": False,
        "updated_packed": False,
        "added_compile": False,
        "added_activation_offloading": False,
        "added_packed": False,
        "added_profiler": False,
        "updated_gradient_accumulation_steps": False,
        "updated_checkpointing_comment": False,
        "updated_gradient_comment": False,
        "updated_compile_comment": False,
        "updated_packed_comment": False,
    }

    with open(file_path, "r") as file:
        lines = file.readlines()
    # Step 2: Remove duplicate 'compile' entries
    compile_indices = [
        i for i, line in enumerate(lines) if line.strip().startswith("compile:")
    ]
    if len(compile_indices) > 1:
        for index in sorted(compile_indices, reverse=True):
            del lines[index]
        updated["updated_compile"] = True
    # Step 3: Move 'packed' after '_component_' and align indentation
    for i, line in enumerate(lines):
        if (
            line.strip().startswith("packed:")
            and i + 1 < len(lines)
            and "_component_" in lines[i + 1]
        ):
            packed_line = lines.pop(i)
            lines.insert(i + 1, packed_line)  # Insert after the _component_ line
            updated["updated_packed"] = True
            break
    # Step 4: Add 'compile' if missing
    if not any(line.strip().startswith("compile:") for line in lines):
        for i, line in enumerate(lines):
            if line.strip().startswith("max_steps_per_epoch:"):
                indentation = len(line) - len(line.lstrip())
                new_line = (
                    " " * indentation
                    + "compile: False # pytorch compile, set to true for better perf/memory\n"
                )
                lines.insert(i + 1, new_line)
                updated["added_compile"] = True
                break
    # Step 5: Add 'enable_activation_offloading' if missing
    if (
        not any(
            line.strip().startswith("enable_activation_offloading:") for line in lines
        )
        and "vision" not in file_path
        and "ppo" not in file_path
        and "dpo" not in file_path
        and "distillation" not in file_path
        and "qat" not in file_path
    ):
        for i, line in enumerate(lines):
            if line.strip().startswith("enable_activation_checkpointing:"):
                indentation = len(line) - len(line.lstrip())
                new_line = (
                    " " * indentation
                    + "enable_activation_offloading: False  # True reduces memory\n"
                )
                lines.insert(i + 1, new_line)
                updated["added_activation_offloading"] = True
                break
    # Step 6: Add 'packed' if missing
    if "dpo" not in file_path and "ppo" not in file_path:
        if (
            not any(line.strip().startswith("packed:") for line in lines)
            and "vision" not in file_path
        ):
            for i, line in enumerate(lines):
                if "_component_" in line and "dataset" in lines[i - 1]:
                    indentation = len(line) - len(line.lstrip())
                    new_line = (
                        " " * indentation + "packed: False # True increases speed\n"
                    )
                    lines.insert(i + 1, new_line)
                    updated["added_packed"] = True
                    break

    # Step 7: Replace/Add 'profiler' section if missing
    if "ppo" not in file_path and "dpo" not in file_path:
        profiler_section = """
# Profiler (disabled)
profiler:
  _component_: torchtune.training.setup_torch_profiler
  enabled: False

  #Output directory of trace artifacts
  output_dir: ${output_dir}/profiling_outputs

  #`torch.profiler.ProfilerActivity` types to trace
  cpu: True
  cuda: True

  #trace options passed to `torch.profiler.profile`
  profile_memory: False
  with_stack: False
  record_shapes: True
  with_flops: False

  # `torch.profiler.schedule` options:
  # wait_steps -> wait, warmup_steps -> warmup, active_steps -> active, num_cycles -> repeat
  wait_steps: 5
  warmup_steps: 3
  active_steps: 2
  num_cycles: 1
"""

        # Correct the 'profiler' section if it has incorrect indentation
        start_index = None
        end_index = None
        for i, line in enumerate(lines):
            if line.strip().startswith("# Profiler (disabled)"):
                start_index = i
            if line.strip().startswith("num_cycles: 1"):
                end_index = i
                break

        if start_index is not None and end_index is not None:
            # Replaces profiler

            # Remove the old section
            del lines[start_index : end_index + 1]
            # Insert the new section
            lines.insert(start_index, profiler_section)
            updated["added_profiler"] = True

        elif not any(line.strip().startswith("profiler:") for line in lines):
            lines.append(profiler_section)
            updated["added_profiler"] = True

    # Step 8: Update 'gradient_accumulation_steps' if greater than 1
    for i, line in enumerate(lines):
        if line.strip().startswith("gradient_accumulation_steps:"):
            parts = line.split(":")
            value = int(parts[1].split("#")[0].strip())
            if len(parts) > 1 and value > 1:
                lines[i] = parts[0] + ": 8\n"
                updated["updated_gradient_accumulation_steps"] = True
            break

    # Step 9: Add or replace comment for 'enable_activation_checkpointing'
    for i, line in enumerate(lines):
        if line.strip().startswith("enable_activation_checkpointing:"):
            parts = line.split("#")
            lines[i] = parts[0].strip() + "  # True reduces memory\n"
            updated["updated_checkpointing_comment"] = True
            break
    # Step 9.5: Add or replace comment for 'enable_activation_offloading'
    for i, line in enumerate(lines):
        if line.strip().startswith("enable_activation_offloading:"):
            parts = line.split("#")
            lines[i] = parts[0].strip() + "  # True reduces memory\n"
            updated["updated_checkpointing_comment"] = True
            break
    # Step 10: Add or replace comment for 'gradient_accumulation_steps'
    for i, line in enumerate(lines):
        if line.strip().startswith("gradient_accumulation_steps:"):
            parts = line.split("#")
            lines[i] = parts[0].rstrip() + "  # Use to increase virtual batch size\n"
            updated["updated_gradient_comment"] = True
            break
    # Step 11: Add or replace comment for 'compile'
    for i, line in enumerate(lines):
        if line.strip().startswith("compile:"):
            parts = line.split("#")
            lines[i] = (
                parts[0].rstrip()
                + "  # torch.compile the model + loss, True increases speed + decreases memory\n"
            )
            updated["updated_compile_comment"] = True
            break
    # Step 12: Add or replace comment for 'packed'
    for i, line in enumerate(lines):
        if line.strip().startswith("packed:"):
            parts = line.split("#")
            lines[i] = parts[0].rstrip() + "  # True increases speed\n"
            updated["updated_packed_comment"] = True
            break

    # for files ending with "full.yaml" or "full_single_device.yaml"
    if (
        file_path.endswith("full.yaml")
        or file_path.endswith("full_single_device.yaml")
        and "qat" not in file_path
        and "ppo" not in file_path
        and "dpo" not in file_path
    ):
        # Step 13: Add 'optimizer_in_bwd: False' if missing
        if not any(line.strip().startswith("optimizer_in_bwd:") for line in lines):
            for i, line in enumerate(lines):
                if line.strip().startswith("compile:"):
                    indentation = len(line) - len(line.lstrip())
                    new_line = " " * indentation + "optimizer_in_bwd: False\n"
                    lines.insert(i + 1, new_line)
                    updated["added_optimizer_in_bwd"] = True
                    break

    # Step 14: Add/replace comment for 'optimizer_in_bwd'
    for i, line in enumerate(lines):
        if line.strip().startswith("optimizer_in_bwd:"):
            parts = line.split("#")
            lines[i] = (
                parts[0].rstrip()
                + "  # True saves memory. Requires gradient_accumulation_steps=1\n"
            )
            updated["updated_optimizer_in_bwd_comment"] = True
            break

    # Step 14.5: Add/replace comment for 'custom_sharded_layers'
    for i, line in enumerate(lines):
        if line.strip().startswith("custom_sharded_layers:"):
            parts = line.split("#")
            lines[i] = (
                parts[0].rstrip()
                + "  # Layers to shard separately (useful for large vocab size models). Lower Memory, but lower speed.\n"
            )
            updated["updated_custom_sharded_layers_comment"] = True
            break

    # for files with lora in the name
    if "lora" in file_path or "dora" in file_path:
        for i, line in enumerate(lines):
            # Step 15: make 'lora_attn_modules: ['q_proj', 'v_proj', 'output_proj']'
            if line.strip().startswith("lora_attn_modules:"):
                lines[i] = "  lora_attn_modules: ['q_proj', 'v_proj', 'output_proj']\n"
            # Step 16: make 'apply_lora_to_mlp: True'
            elif line.strip().startswith("apply_lora_to_mlp:"):
                lines[i] = "  apply_lora_to_mlp: True\n"
            # Step 17: add comment to 'lora_rank'
            elif line.strip().startswith("lora_rank:"):
                parts = line.split("#")
                lines[i] = (
                    parts[0].rstrip() + "  # higher increases accuracy and memory\n"
                )
            # Step 18: add comment to 'lora_alpha'
            elif line.strip().startswith("lora_alpha:"):
                parts = line.split("#")
                lines[i] = parts[0].rstrip() + "  # usually alpha=2*rank\n"

    # Step 19: Add 'output_dir' before 'model:' if missing or replace existing one
    new_output_dir_index = None

    # Find index of existing 'output_dir:'
    for i, line in enumerate(lines):
        if line.startswith("output_dir:"):
            current_output_dir = line

            # Remove existing 'output_dir:'
            del lines[i]

            # if next line is empty, remove it too
            if i < len(lines) and not lines[i].strip():
                del lines[i]

            break

    # Find index of first non-empty line that is not a comment
    for i, line in enumerate(lines):
        if line.strip() and not line.startswith("#"):

            # go back until line is empty, to find a good spot for output_dir
            while i > 0 and lines[i - 1].strip():
                i -= 1

            new_output_dir_index = i

            break

    # Add new 'output_dir:' before 'model:'
    if new_output_dir_index is not None and current_output_dir is not None:
        lines.insert(new_output_dir_index, current_output_dir)
        lines.insert(new_output_dir_index + 1, "\n")

    # Step 20: rename output_dir to /tmp/torchtune/model/recipe.
    # file_path = path_to_file/model_folder/{model_size}_{recipe_name}.yaml
    model_folder = file_path.split("/")[-2]
    config_name = file_path.split("/")[-1]

    if "_KD_" not in config_name:
        model_size = config_name.split("_")[0]  # e.g. 8B_lora_full.yaml
    else:
        model_size = config_name.split("_KD_")[0]  # e.g. 8B_to_1B_KD_lora_full.yaml
    model_size = model_size.replace(".", "_")

    recipe_name = config_name[len(model_size) + 1 :].replace(".yaml", "")
    new_output_dir = f"/tmp/torchtune/{model_folder}_{model_size}/{recipe_name}"

    for i, line in enumerate(lines):
        if line.startswith("output_dir:"):
            lines[i] = (
                f"output_dir: {new_output_dir} # /tmp may be deleted by your system. Change it to your preference.\n"
            )
            break

    # Step 21: update "metric_logger.log_dir" to ${output_dir}/logs
    for i, line in enumerate(lines):
        if line.strip().startswith("metric_logger:"):
            for j in range(i + 1, len(lines)):
                if lines[j].strip().startswith("log_dir:"):
                    spacing = len(lines[j]) - len(lines[j].lstrip())
                    lines[j] = (" " * spacing) + "log_dir: ${output_dir}/logs\n"
                    break
                elif not line.strip():  # end of section
                    break
            break

    # Step 22: update "checkpointer.output_dir" to ${output_dir}
    for i, line in enumerate(lines):
        if line.strip().startswith("checkpointer:"):
            for j in range(i + 1, len(lines)):
                if lines[j].strip().startswith("output_dir:"):
                    spacing = len(lines[j]) - len(lines[j].lstrip())
                    lines[j] = (" " * spacing) + "output_dir: ${output_dir}\n"
                    break
                elif not line.strip():  # end of section
                    break
            break

    with open(file_path, "w") as file:
        file.writelines(lines)
    return updated


def search_yaml_files(directory):
    updated_files = []
    not_updated_files = []
    for root, dirs, files in os.walk(directory):
        for file in files:
            if "_" not in file or "generation" in file or "evaluation" in file:
                print(f"Skipping {file}")
                continue
            file_path = os.path.join(root, file)
            updates = modify_yaml_file(file_path)
            if any(updates.values()):
                updated_files.append({file_path: updates})
            else:
                not_updated_files.append(file_path)
    print("Updated files and changes:")
    for update in updated_files:
        print(update)
    print("\nFiles not updated:")
    for file in not_updated_files:
        print(file)


directory = "recipes/configs"
search_yaml_files(directory)

Copy link

pytorch-bot bot commented Dec 3, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2107

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit a1d0eda with merge base 32e265d (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 3, 2024
Copy link
Contributor

@pbontrager pbontrager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small comments on names but otherewise looks good

@@ -18,6 +18,8 @@
# best to use 8B_full_single_device.yaml for those cases


output_dir: /tmp/torchtune/dev_8B/full_experimental # /tmp may be deleted by your system. Change it to your preference.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this recipe? Shouldn't it be under a dev/model structure?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the dev recipe for selective activation checkpointing. I think we should decide what we wanna do with this feature (either integrate it by default or scrap it, cause I don't like that we currently expose two different AC APIs). I think it still provides parity with vanilla AC so we could just turn it on everywhere given requests like #2101

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #2114 to continue the discussion there

@@ -57,8 +59,8 @@ optimizer:
loss:
_component_: torchtune.modules.loss.CEWithChunkedOutputLoss
max_steps_per_epoch: null
gradient_accumulation_steps: 1 # Use to increase virtual batch size
compile: False # pytorch compile, set to true for better perf/memory
gradient_accumulation_steps: 1 # Use to increase effective batch size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what makes this experimental, but did you double check that gradient_acc and compile work with this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, but this change was just a comment change

@@ -10,11 +10,13 @@
# tune run lora_finetune_single_device --config llama3_1/8B_lora_single_device
#
# To launch on a single device, run the following command from root:
# tune run knowledge_distillation_single_device --config llama3_2/knowledge_distillation_single_device
# tune run knowledge_distillation_single_device --config llama3_2/8B_to_1B_KD_single_device
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add lora to the name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@felipemello1 felipemello1 merged commit 2b1ee6d into pytorch:main Dec 6, 2024
17 checks passed
rahul-sarvam added a commit to sarvamai/torchtune that referenced this pull request Dec 8, 2024
* Llama 3.3 70B (pytorch#2124)

* Llama 3.3 readme updates (pytorch#2125)

* update configs (pytorch#2107)

Co-authored-by: Felipe Mello <[email protected]>

* Reduce logging output for distributed KD (pytorch#2120)

* Support Early Exit Loss and/or Layer Dropout (pytorch#1076)

Co-authored-by: ebsmothers <[email protected]>

* Update checkpointing directory (pytorch#2074)

Co-authored-by: Felipe Mello <[email protected]>
Co-authored-by: vancoyendall <[email protected]>

* pass correct arg (pytorch#2127)

Co-authored-by: Felipe Mello <[email protected]>

* update configs (pytorch#2128)

Co-authored-by: Felipe Mello <[email protected]>

* fix qat_lora_test (pytorch#2131)

Co-authored-by: Felipe Mello <[email protected]>

---------

Co-authored-by: Philip Bontrager <[email protected]>
Co-authored-by: ebsmothers <[email protected]>
Co-authored-by: Felipe Mello <[email protected]>
Co-authored-by: Felipe Mello <[email protected]>
Co-authored-by: Joe Cummings <[email protected]>
Co-authored-by: Mostafa Elhoushi <[email protected]>
Co-authored-by: vancoyendall <[email protected]>
rahul-sarvam added a commit to sarvamai/torchtune that referenced this pull request Dec 9, 2024
* Llama 3.3 70B (pytorch#2124)

* Llama 3.3 readme updates (pytorch#2125)

* update configs (pytorch#2107)

Co-authored-by: Felipe Mello <[email protected]>

* Reduce logging output for distributed KD (pytorch#2120)

* Support Early Exit Loss and/or Layer Dropout (pytorch#1076)

Co-authored-by: ebsmothers <[email protected]>

* Update checkpointing directory (pytorch#2074)

Co-authored-by: Felipe Mello <[email protected]>
Co-authored-by: vancoyendall <[email protected]>

* pass correct arg (pytorch#2127)

Co-authored-by: Felipe Mello <[email protected]>

* update configs (pytorch#2128)

Co-authored-by: Felipe Mello <[email protected]>

* fix qat_lora_test (pytorch#2131)

Co-authored-by: Felipe Mello <[email protected]>

---------

Co-authored-by: Philip Bontrager <[email protected]>
Co-authored-by: ebsmothers <[email protected]>
Co-authored-by: Felipe Mello <[email protected]>
Co-authored-by: Felipe Mello <[email protected]>
Co-authored-by: Joe Cummings <[email protected]>
Co-authored-by: Mostafa Elhoushi <[email protected]>
Co-authored-by: vancoyendall <[email protected]>
rahul-sarvam added a commit to sarvamai/torchtune that referenced this pull request Dec 18, 2024
* Llama 3.3 70B (pytorch#2124)

* Llama 3.3 readme updates (pytorch#2125)

* update configs (pytorch#2107)

Co-authored-by: Felipe Mello <[email protected]>

* Reduce logging output for distributed KD (pytorch#2120)

* Support Early Exit Loss and/or Layer Dropout (pytorch#1076)

Co-authored-by: ebsmothers <[email protected]>

* Update checkpointing directory (pytorch#2074)

Co-authored-by: Felipe Mello <[email protected]>
Co-authored-by: vancoyendall <[email protected]>

* pass correct arg (pytorch#2127)

Co-authored-by: Felipe Mello <[email protected]>

* update configs (pytorch#2128)

Co-authored-by: Felipe Mello <[email protected]>

* fix qat_lora_test (pytorch#2131)

Co-authored-by: Felipe Mello <[email protected]>

* guard ckpt imports (pytorch#2133)

Co-authored-by: Felipe Mello <[email protected]>

* [bug fix] add parents=True (pytorch#2136)

Co-authored-by: Felipe Mello <[email protected]>

* [bug fix] re-add model (pytorch#2135)

Co-authored-by: Felipe Mello <[email protected]>

* Update save sizes into GiB (pytorch#2143)

* [bug fix] remove config download when source is kaggle (pytorch#2144)

Co-authored-by: Felipe Mello <[email protected]>

* [fix] remove "with_suffix" (pytorch#2146)

Co-authored-by: Felipe Mello <[email protected]>

* DoRA fixes (pytorch#2139)



Co-authored-by: Mircea Mironenco <[email protected]>

* [Fix] Llama 3.2 Vision decoder_trainable flag fixed (pytorch#2150)

* Small readme, config updates (pytorch#2157)

* Using `FormattedCheckpointFiles` in configs (pytorch#2147)

* Move ``get_world_size_and_rank`` to utils (pytorch#2155)

* Faster intermediate checkpoints with DCP async save in TorchTune (pytorch#2006)

Co-authored-by: Saurabh Mishra <[email protected]>

* torchdata integration - multi-dataset and streaming support (pytorch#1929)

* Allow higher version of lm-eval (pytorch#2165)

* Using `FormattedCheckpointFiles` in configs... round 2 (pytorch#2167)

* [EZ] Fix set_torch_num_threads in multi-node. (pytorch#2164)

---------

Co-authored-by: Philip Bontrager <[email protected]>
Co-authored-by: ebsmothers <[email protected]>
Co-authored-by: Felipe Mello <[email protected]>
Co-authored-by: Felipe Mello <[email protected]>
Co-authored-by: Joe Cummings <[email protected]>
Co-authored-by: Mostafa Elhoushi <[email protected]>
Co-authored-by: vancoyendall <[email protected]>
Co-authored-by: Mircea Mironenco <[email protected]>
Co-authored-by: salman <[email protected]>
Co-authored-by: Saurabh Mishra <[email protected]>
Co-authored-by: Saurabh Mishra <[email protected]>
Co-authored-by: Andrew Ho <[email protected]>
Co-authored-by: Eugen Hotaj <[email protected]>
rahul-sarvam pushed a commit to sarvamai/torchtune that referenced this pull request Dec 23, 2024
rahul-sarvam pushed a commit to sarvamai/torchtune that referenced this pull request Dec 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants