-
Notifications
You must be signed in to change notification settings - Fork 523
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing special_tokens
arg in Llama3VisionTransform
#2000
Fixing special_tokens
arg in Llama3VisionTransform
#2000
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2000
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ✅ No FailuresAs of commit 78249a6 with merge base 18d97f0 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just before i approve it, was it tested? Could you add a bit of context about how you found this issue?
Thank you :))
special_tokens_path (Optional[str]): Path to ``tokenizer.json`` from Hugging Face | ||
model files that contains all registered special tokens, or a local json file | ||
structured similarly. Default is None to use the canonical Llama3 special tokens. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this makes sense to me! Do we know how this happened? i.e. is it possible that this is a bug in other models too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we only have the Llama3.2 Transform so it only applies to this model
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix as always
I did indeed test it out manually. I was trying to add special tokens when using this transform, but I found the argument wasn't named/typed correctly (it should be identical to the Llama3 tokenizer builder). This PR changes the name/typing/docstring to match - the functionality was correct as-is. |
On main: from torchtune.models.llama3_2_vision import llama3_2_vision_transform
# special_tokens.json reformatted from https://huggingface.co/meta-llama/Llama-3.2-11B-Vision/raw/main/tokenizer_config.json
transform = llama3_2_vision_transform(
path="./Llama-3.2-11B-Vision-Instruct/original/tokenizer.model",
special_tokens_path="./special_tokens.json"
)
{
"name": "TypeError",
"message": "expected str, bytes or os.PathLike object, not dict",
"stack": "---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[14], line 3
1 from torchtune.models.llama3_2_vision import llama3_2_vision_transform
----> 3 transform = llama3_2_vision_transform(
4 path=\"./Llama-3.2-11B-Vision-Instruct/original/tokenizer.model\",
5 special_tokens_path=\"./special_tokens.json\"
6 )
File ~/projects/torchtune/torchtune/models/llama3_2_vision/_model_builders.py:61, in llama3_2_vision_transform(path, max_seq_len, image_size, special_tokens_path, prompt_template)
53 special_tokens = (
54 parse_hf_tokenizer_json(special_tokens_path)
55 if special_tokens_path is not None
56 else None
57 )
58 template = (
59 _get_prompt_template(prompt_template) if prompt_template is not None else None
60 )
---> 61 return Llama3VisionTransform(
62 path=path,
63 special_tokens=special_tokens,
64 tile_size=image_size,
65 patch_size=14,
66 max_num_tiles=4,
67 max_seq_len=max_seq_len,
68 image_mean=(0.48145466, 0.4578275, 0.40821073),
69 image_std=(0.26862954, 0.26130258, 0.27577711),
70 prompt_template=template,
71 )
File ~/projects/torchtune/torchtune/models/llama3_2_vision/_transform.py:77, in Llama3VisionTransform.__init__(self, path, tile_size, patch_size, max_num_tiles, special_tokens, max_seq_len, image_mean, image_std, prompt_template)
64 def __init__(
65 self,
66 path: str,
(...)
75 prompt_template: Optional[PromptTemplate] = None,
76 ):
---> 77 self.tokenizer = llama3_tokenizer(
78 path,
79 special_tokens_path=special_tokens,
80 max_seq_len=max_seq_len,
81 prompt_template=prompt_template,
82 )
83 self.transform_image = CLIPImageTransform(
84 image_mean=image_mean,
85 image_std=image_std,
(...)
90 resize_to_max_canvas=False,
91 )
92 self.xattn_mask = VisionCrossAttentionMask(
93 tile_size=tile_size,
94 patch_size=patch_size,
95 image_token_id=self.tokenizer.image_id,
96 )
File ~/projects/torchtune/torchtune/models/llama3/_model_builders.py:87, in llama3_tokenizer(path, special_tokens_path, max_seq_len, prompt_template)
68 def llama3_tokenizer(path: str, special_tokens_path: Optional[str] = None, max_seq_len: Optional[int] = None, prompt_template: Optional[_TemplateType] = None) -> Llama3Tokenizer:
69 \"\"\"
70 Tokenizer for Llama3.
71
(...)
85 Llama3Tokenizer: Instantiation of the Llama3 tokenizer
86 \"\"\"
---> 87 special_tokens = parse_hf_tokenizer_json(special_tokens_path) if special_tokens_path is not None else None
88 template = _get_prompt_template(prompt_template) if prompt_template is not None else None
89 return Llama3Tokenizer(path=path, special_tokens=special_tokens, max_seq_len=max_seq_len, prompt_template=template)
File ~/projects/torchtune/torchtune/modules/tokenizers/_utils.py:194, in parse_hf_tokenizer_json(tokenizer_json_path)
183 def parse_hf_tokenizer_json(tokenizer_json_path: str) -> Dict[str, int]:
184 \"\"\"
185 Parse the ``tokenizer.json`` file from a Hugging Face model to extract the
186 special token str to id mapping.
(...)
192 Dict[str, int]: The special token str to id mapping.
193 \"\"\"
--> 194 with open(tokenizer_json_path, \"r\") as f:
195 tokenizer_json = json.load(f)
197 return {token[\"content\"]: token[\"id\"] for token in tokenizer_json[\"added_tokens\"]}
TypeError: expected str, bytes or os.PathLike object, not dict"
} on this branch from torchtune.models.llama3_2_vision import llama3_2_vision_transform
# special_tokens.json reformatted from https://huggingface.co/meta-llama/Llama-3.2-11B-Vision/raw/main/tokenizer_config.json
transform = llama3_2_vision_transform(
path="./Llama-3.2-11B-Vision-Instruct/original/tokenizer.model",
special_tokens_path="./special_tokens.json"
)
# success! |
template = ( | ||
_get_prompt_template(prompt_template) if prompt_template is not None else None | ||
) | ||
return Llama3VisionTransform( | ||
path=path, | ||
special_tokens=special_tokens, | ||
special_tokens_path=special_tokens_path, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes I think this makes sense, the path should only be loaded in the tokenizer builder directly
Context
What is the purpose of this PR? Is it to