Fixing `special_tokens` arg in `Llama3VisionTransform` #2000

SalmanMohammadi · 2024-11-13T19:19:44Z

Context

What is the purpose of this PR? Is it to

add a new feature
fix a bug
update tests and/or documentation
other (please add here)

                 ()
                /\
               //\\
              <<  >>
          ()   \\//   ()
()._____   /\   \\   /\   _____.()
   \.--.\ //\\ //\\ //\\ /.--./
    \\__\\/__\//__\//__\\/__//
     '--/\\--//\--//\--/\\--'
        \\\\///\\//\\\////
    ()-= >>\\< <\\> >\\<< =-()
        ////\\\//\\///\\\\
     .--\\/--\//--\//--\//--.
    //""/\\""//\""//\""//\""\\
   /'--'/ \\// \\// \\// \'--'\
 ()`"""`   \/   //   \/   `""""`()
          ()   //\\   ()
              <<  >>
               \\//
                \/
                ()

pytorch-bot · 2024-11-13T19:19:47Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2000

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

GLIBC not found in Nova workflows

✅ No Failures

As of commit 78249a6 with merge base 18d97f0 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

felipemello1

just before i approve it, was it tested? Could you add a bit of context about how you found this issue?

Thank you :))

felipemello1 · 2024-11-13T20:46:38Z

torchtune/models/llama3_2_vision/_transform.py

+        special_tokens_path (Optional[str]): Path to ``tokenizer.json`` from Hugging Face
+            model files that contains all registered special tokens, or a local json file
+            structured similarly. Default is None to use the canonical Llama3 special tokens.


this makes sense to me! Do we know how this happened? i.e. is it possible that this is a bug in other models too?

I think we only have the Llama3.2 Transform so it only applies to this model

RdoubleA

Thanks for the fix as always

SalmanMohammadi · 2024-11-13T23:58:55Z

just before i approve it, was it tested? Could you add a bit of context about how you found this issue?

Thank you :))

I did indeed test it out manually. I was trying to add special tokens when using this transform, but I found the argument wasn't named/typed correctly (it should be identical to the Llama3 tokenizer builder). This PR changes the name/typing/docstring to match - the functionality was correct as-is.

SalmanMohammadi · 2024-11-14T00:05:06Z

Don't merge just yet - I just tested the 3.2 vision transform builder too. It looks like we do parse the special tokens JSON into a mapping here but pass this as the arg to special_tokens_path: str into the underlying tokenizer. This should it be changed too right? @RdoubleA

SalmanMohammadi · 2024-11-14T09:53:00Z

On main:

from torchtune.models.llama3_2_vision import llama3_2_vision_transform
# special_tokens.json reformatted from https://huggingface.co/meta-llama/Llama-3.2-11B-Vision/raw/main/tokenizer_config.json
transform = llama3_2_vision_transform(
    path="./Llama-3.2-11B-Vision-Instruct/original/tokenizer.model",
    special_tokens_path="./special_tokens.json"
)
{
	"name": "TypeError",
	"message": "expected str, bytes or os.PathLike object, not dict",
	"stack": "---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[14], line 3
      1 from torchtune.models.llama3_2_vision import llama3_2_vision_transform
----> 3 transform = llama3_2_vision_transform(
      4     path=\"./Llama-3.2-11B-Vision-Instruct/original/tokenizer.model\",
      5     special_tokens_path=\"./special_tokens.json\"
      6 )

File ~/projects/torchtune/torchtune/models/llama3_2_vision/_model_builders.py:61, in llama3_2_vision_transform(path, max_seq_len, image_size, special_tokens_path, prompt_template)
     53 special_tokens = (
     54     parse_hf_tokenizer_json(special_tokens_path)
     55     if special_tokens_path is not None
     56     else None
     57 )
     58 template = (
     59     _get_prompt_template(prompt_template) if prompt_template is not None else None
     60 )
---> 61 return Llama3VisionTransform(
     62     path=path,
     63     special_tokens=special_tokens,
     64     tile_size=image_size,
     65     patch_size=14,
     66     max_num_tiles=4,
     67     max_seq_len=max_seq_len,
     68     image_mean=(0.48145466, 0.4578275, 0.40821073),
     69     image_std=(0.26862954, 0.26130258, 0.27577711),
     70     prompt_template=template,
     71 )

File ~/projects/torchtune/torchtune/models/llama3_2_vision/_transform.py:77, in Llama3VisionTransform.__init__(self, path, tile_size, patch_size, max_num_tiles, special_tokens, max_seq_len, image_mean, image_std, prompt_template)
     64 def __init__(
     65     self,
     66     path: str,
   (...)
     75     prompt_template: Optional[PromptTemplate] = None,
     76 ):
---> 77     self.tokenizer = llama3_tokenizer(
     78         path,
     79         special_tokens_path=special_tokens,
     80         max_seq_len=max_seq_len,
     81         prompt_template=prompt_template,
     82     )
     83     self.transform_image = CLIPImageTransform(
     84         image_mean=image_mean,
     85         image_std=image_std,
   (...)
     90         resize_to_max_canvas=False,
     91     )
     92     self.xattn_mask = VisionCrossAttentionMask(
     93         tile_size=tile_size,
     94         patch_size=patch_size,
     95         image_token_id=self.tokenizer.image_id,
     96     )

File ~/projects/torchtune/torchtune/models/llama3/_model_builders.py:87, in llama3_tokenizer(path, special_tokens_path, max_seq_len, prompt_template)
     68 def llama3_tokenizer(path: str, special_tokens_path: Optional[str] = None, max_seq_len: Optional[int] = None, prompt_template: Optional[_TemplateType] = None) -> Llama3Tokenizer:
     69     \"\"\"
     70     Tokenizer for Llama3.
     71 
   (...)
     85         Llama3Tokenizer: Instantiation of the Llama3 tokenizer
     86     \"\"\"
---> 87     special_tokens = parse_hf_tokenizer_json(special_tokens_path) if special_tokens_path is not None else None
     88     template = _get_prompt_template(prompt_template) if prompt_template is not None else None
     89     return Llama3Tokenizer(path=path, special_tokens=special_tokens, max_seq_len=max_seq_len, prompt_template=template)

File ~/projects/torchtune/torchtune/modules/tokenizers/_utils.py:194, in parse_hf_tokenizer_json(tokenizer_json_path)
    183 def parse_hf_tokenizer_json(tokenizer_json_path: str) -> Dict[str, int]:
    184     \"\"\"
    185     Parse the ``tokenizer.json`` file from a Hugging Face model to extract the
    186     special token str to id mapping.
   (...)
    192         Dict[str, int]: The special token str to id mapping.
    193     \"\"\"
--> 194     with open(tokenizer_json_path, \"r\") as f:
    195         tokenizer_json = json.load(f)
    197     return {token[\"content\"]: token[\"id\"] for token in tokenizer_json[\"added_tokens\"]}

TypeError: expected str, bytes or os.PathLike object, not dict"
}

on this branch

from torchtune.models.llama3_2_vision import llama3_2_vision_transform
# special_tokens.json reformatted from https://huggingface.co/meta-llama/Llama-3.2-11B-Vision/raw/main/tokenizer_config.json
transform = llama3_2_vision_transform(
    path="./Llama-3.2-11B-Vision-Instruct/original/tokenizer.model",
    special_tokens_path="./special_tokens.json"
)
# success!

RdoubleA · 2024-11-14T15:35:22Z

torchtune/models/llama3_2_vision/_model_builders.py

    template = (
        _get_prompt_template(prompt_template) if prompt_template is not None else None
    )
    return Llama3VisionTransform(
        path=path,
-        special_tokens=special_tokens,
+        special_tokens_path=special_tokens_path,


yes I think this makes sense, the path should only be loaded in the tokenizer builder directly

boom

d262fb7

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 13, 2024

felipemello1 reviewed Nov 13, 2024

View reviewed changes

RdoubleA approved these changes Nov 13, 2024

View reviewed changes

removing parsing special tokens in vision transform builder

78249a6

SalmanMohammadi requested review from RdoubleA and felipemello1 November 14, 2024 15:04

RdoubleA reviewed Nov 14, 2024

View reviewed changes

RdoubleA approved these changes Nov 14, 2024

View reviewed changes

SalmanMohammadi merged commit e61def1 into pytorch:main Nov 14, 2024
17 checks passed

SalmanMohammadi deleted the vision_transform_doc_nit branch November 14, 2024 15:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixing `special_tokens` arg in `Llama3VisionTransform` #2000

Fixing `special_tokens` arg in `Llama3VisionTransform` #2000

SalmanMohammadi commented Nov 13, 2024

pytorch-bot bot commented Nov 13, 2024 •

edited

Loading

felipemello1 left a comment

felipemello1 Nov 13, 2024

RdoubleA Nov 13, 2024

RdoubleA left a comment

SalmanMohammadi commented Nov 13, 2024

SalmanMohammadi commented Nov 14, 2024 •

edited

Loading

SalmanMohammadi commented Nov 14, 2024

RdoubleA Nov 14, 2024

Fixing special_tokens arg in Llama3VisionTransform #2000

Fixing special_tokens arg in Llama3VisionTransform #2000

Conversation

SalmanMohammadi commented Nov 13, 2024

Context

pytorch-bot bot commented Nov 13, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2000

❗ 1 Active SEVs

✅ No Failures

felipemello1 left a comment

Choose a reason for hiding this comment

felipemello1 Nov 13, 2024

Choose a reason for hiding this comment

RdoubleA Nov 13, 2024

Choose a reason for hiding this comment

RdoubleA left a comment

Choose a reason for hiding this comment

SalmanMohammadi commented Nov 13, 2024

SalmanMohammadi commented Nov 14, 2024 • edited Loading

SalmanMohammadi commented Nov 14, 2024

RdoubleA Nov 14, 2024

Choose a reason for hiding this comment

Fixing `special_tokens` arg in `Llama3VisionTransform` #2000

Fixing `special_tokens` arg in `Llama3VisionTransform` #2000

pytorch-bot bot commented Nov 13, 2024 •

edited

Loading

SalmanMohammadi commented Nov 14, 2024 •

edited

Loading