Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing special_tokens arg in Llama3VisionTransform #2000

Merged

Conversation

SalmanMohammadi
Copy link
Collaborator

Context

What is the purpose of this PR? Is it to

  • add a new feature
  • fix a bug
  • update tests and/or documentation
  • other (please add here)
                 ()
                /\
               //\\
              <<  >>
          ()   \\//   ()
()._____   /\   \\   /\   _____.()
   \.--.\ //\\ //\\ //\\ /.--./
    \\__\\/__\//__\//__\\/__//
     '--/\\--//\--//\--/\\--'
        \\\\///\\//\\\////
    ()-= >>\\< <\\> >\\<< =-()
        ////\\\//\\///\\\\
     .--\\/--\//--\//--\//--.
    //""/\\""//\""//\""//\""\\
   /'--'/ \\// \\// \\// \'--'\
 ()`"""`   \/   //   \/   `""""`()
          ()   //\\   ()
              <<  >>
               \\//
                \/
                ()

Copy link

pytorch-bot bot commented Nov 13, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2000

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ No Failures

As of commit 78249a6 with merge base 18d97f0 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 13, 2024
Copy link
Contributor

@felipemello1 felipemello1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just before i approve it, was it tested? Could you add a bit of context about how you found this issue?

Thank you :))

Comment on lines +38 to +40
special_tokens_path (Optional[str]): Path to ``tokenizer.json`` from Hugging Face
model files that contains all registered special tokens, or a local json file
structured similarly. Default is None to use the canonical Llama3 special tokens.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this makes sense to me! Do we know how this happened? i.e. is it possible that this is a bug in other models too?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we only have the Llama3.2 Transform so it only applies to this model

Copy link
Contributor

@RdoubleA RdoubleA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix as always

@SalmanMohammadi
Copy link
Collaborator Author

just before i approve it, was it tested? Could you add a bit of context about how you found this issue?

Thank you :))

I did indeed test it out manually. I was trying to add special tokens when using this transform, but I found the argument wasn't named/typed correctly (it should be identical to the Llama3 tokenizer builder). This PR changes the name/typing/docstring to match - the functionality was correct as-is.

@SalmanMohammadi
Copy link
Collaborator Author

SalmanMohammadi commented Nov 14, 2024

Don't merge just yet - I just tested the 3.2 vision transform builder too. It looks like we do parse the special tokens JSON into a mapping here but pass this as the arg to special_tokens_path: str into the underlying tokenizer. This should it be changed too right? @RdoubleA

@SalmanMohammadi
Copy link
Collaborator Author

On main:

from torchtune.models.llama3_2_vision import llama3_2_vision_transform
# special_tokens.json reformatted from https://huggingface.co/meta-llama/Llama-3.2-11B-Vision/raw/main/tokenizer_config.json
transform = llama3_2_vision_transform(
    path="./Llama-3.2-11B-Vision-Instruct/original/tokenizer.model",
    special_tokens_path="./special_tokens.json"
)
{
	"name": "TypeError",
	"message": "expected str, bytes or os.PathLike object, not dict",
	"stack": "---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[14], line 3
      1 from torchtune.models.llama3_2_vision import llama3_2_vision_transform
----> 3 transform = llama3_2_vision_transform(
      4     path=\"./Llama-3.2-11B-Vision-Instruct/original/tokenizer.model\",
      5     special_tokens_path=\"./special_tokens.json\"
      6 )

File ~/projects/torchtune/torchtune/models/llama3_2_vision/_model_builders.py:61, in llama3_2_vision_transform(path, max_seq_len, image_size, special_tokens_path, prompt_template)
     53 special_tokens = (
     54     parse_hf_tokenizer_json(special_tokens_path)
     55     if special_tokens_path is not None
     56     else None
     57 )
     58 template = (
     59     _get_prompt_template(prompt_template) if prompt_template is not None else None
     60 )
---> 61 return Llama3VisionTransform(
     62     path=path,
     63     special_tokens=special_tokens,
     64     tile_size=image_size,
     65     patch_size=14,
     66     max_num_tiles=4,
     67     max_seq_len=max_seq_len,
     68     image_mean=(0.48145466, 0.4578275, 0.40821073),
     69     image_std=(0.26862954, 0.26130258, 0.27577711),
     70     prompt_template=template,
     71 )

File ~/projects/torchtune/torchtune/models/llama3_2_vision/_transform.py:77, in Llama3VisionTransform.__init__(self, path, tile_size, patch_size, max_num_tiles, special_tokens, max_seq_len, image_mean, image_std, prompt_template)
     64 def __init__(
     65     self,
     66     path: str,
   (...)
     75     prompt_template: Optional[PromptTemplate] = None,
     76 ):
---> 77     self.tokenizer = llama3_tokenizer(
     78         path,
     79         special_tokens_path=special_tokens,
     80         max_seq_len=max_seq_len,
     81         prompt_template=prompt_template,
     82     )
     83     self.transform_image = CLIPImageTransform(
     84         image_mean=image_mean,
     85         image_std=image_std,
   (...)
     90         resize_to_max_canvas=False,
     91     )
     92     self.xattn_mask = VisionCrossAttentionMask(
     93         tile_size=tile_size,
     94         patch_size=patch_size,
     95         image_token_id=self.tokenizer.image_id,
     96     )

File ~/projects/torchtune/torchtune/models/llama3/_model_builders.py:87, in llama3_tokenizer(path, special_tokens_path, max_seq_len, prompt_template)
     68 def llama3_tokenizer(path: str, special_tokens_path: Optional[str] = None, max_seq_len: Optional[int] = None, prompt_template: Optional[_TemplateType] = None) -> Llama3Tokenizer:
     69     \"\"\"
     70     Tokenizer for Llama3.
     71 
   (...)
     85         Llama3Tokenizer: Instantiation of the Llama3 tokenizer
     86     \"\"\"
---> 87     special_tokens = parse_hf_tokenizer_json(special_tokens_path) if special_tokens_path is not None else None
     88     template = _get_prompt_template(prompt_template) if prompt_template is not None else None
     89     return Llama3Tokenizer(path=path, special_tokens=special_tokens, max_seq_len=max_seq_len, prompt_template=template)

File ~/projects/torchtune/torchtune/modules/tokenizers/_utils.py:194, in parse_hf_tokenizer_json(tokenizer_json_path)
    183 def parse_hf_tokenizer_json(tokenizer_json_path: str) -> Dict[str, int]:
    184     \"\"\"
    185     Parse the ``tokenizer.json`` file from a Hugging Face model to extract the
    186     special token str to id mapping.
   (...)
    192         Dict[str, int]: The special token str to id mapping.
    193     \"\"\"
--> 194     with open(tokenizer_json_path, \"r\") as f:
    195         tokenizer_json = json.load(f)
    197     return {token[\"content\"]: token[\"id\"] for token in tokenizer_json[\"added_tokens\"]}

TypeError: expected str, bytes or os.PathLike object, not dict"
}

on this branch

from torchtune.models.llama3_2_vision import llama3_2_vision_transform
# special_tokens.json reformatted from https://huggingface.co/meta-llama/Llama-3.2-11B-Vision/raw/main/tokenizer_config.json
transform = llama3_2_vision_transform(
    path="./Llama-3.2-11B-Vision-Instruct/original/tokenizer.model",
    special_tokens_path="./special_tokens.json"
)
# success!

template = (
_get_prompt_template(prompt_template) if prompt_template is not None else None
)
return Llama3VisionTransform(
path=path,
special_tokens=special_tokens,
special_tokens_path=special_tokens_path,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I think this makes sense, the path should only be loaded in the tokenizer builder directly

@SalmanMohammadi SalmanMohammadi merged commit e61def1 into pytorch:main Nov 14, 2024
17 checks passed
@SalmanMohammadi SalmanMohammadi deleted the vision_transform_doc_nit branch November 14, 2024 15:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants