-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to use BLIP2 with caption_coco_opt6.7b at HEAD via salesforce-lavis (also HEAD) #21713
Comments
Hey @AstraliteHeart 👋 This issue seems to be a duplicate of #21599, which is fixed. Can I ask you to try to run your script using |
I don't think this is a duplicate, my env is past that fix (see p4 in the original repro steps), I've updated form
|
Thank you for confirming @AstraliteHeart 🤗 I will dig deeper and let you know what I find! |
After some digging, we can see that the exception is raised as follows: │ /home/joao/hf/lib/python3.10/site-packages/lavis/models/blip2_models/modeling_opt.py:703 in │
│ forward │
│ │
│ 700 │ │ │ inputs_embeds = self.embed_tokens(input_ids) │
│ 701 │ │ │
│ 702 │ │ if query_embeds is not None: │
│ ❱ 703 │ │ │ inputs_embeds = torch.cat([query_embeds, inputs_embeds], dim=1) │
│ 704 │ │ │ input_shape = inputs_embeds.size()[:-1] │
│ 705 │ │ │
│ 706 │ │ # embed positions │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 25 but got size 5 for tensor number 1 in the list. From the full stack trace, we can conclude that the error arises from an issue in @AstraliteHeart This means you have two options:
|
@gante thank you for debugging! I can confirm that syncing before #21405 (edc1e73) works, I'll open an issue on SF side to warn them about the breakage, unfortunately this brings me to the original issue of trying to use I've tried both
what combination of versions of |
Hi, Thanks for converting BLIP2 to HF :) I actually forked the LAVIS repo and made some tweaks to facilitate conversion (I removed a bunch of unnecessary requirements etc). See here. |
Hi Niels, thank you for checking this. I did use your fork (or so I thought, sigh), but I redid everything from scratch while comparing traces with code and, well... turned out I moved my blip2 conversion script to LAVIS git root folder which kept including their model (as it's in the I can now confirm that with your fork I was able to convert my model with snapshot before #21405 and load it it in 8 bits with latest Do you have any guidance on matching outputs between lavis and hf models? I ran about 50 samples though lavis/hf16/hf8 and while hf16 and hf8 are mostly consistent (good), lavis output is better in all cases. (see anecdotal examples below) Here is roughly how I load and run all models (https://gist.github.com/AstraliteHeart/4d7ebf834021b8e1c9bc439c1633002c) I tried to make sure all settings and rnd seeds are matching, but perhaps I am missing something? https://derpicdn.net/img/view/2023/2/23/3051871.png
https://derpicdn.net/img/2017/7/7/1480500/large.png
|
Thanks for reporting, that should not be the case! I extensively tested the greedy/beam search outputs on original vs my implementation to make sure everything works as expected. But the generate method has had some updates now so there might be a small issue. However isn't it weird that the first token is already different? cc'ing @gante here |
Also I'm not sure you can run both LAVIS and Transformers main branch in the same environment to compare, cause LAVIS relies on an older version of Transformers |
Results on top are from Some more tests (tldr, latest transformers still do not produce the same output) Official
Latest transformers:
|
Hey @AstraliteHeart 👋 Differences in generation can be explained by many parts of the stack, from ninja numerical bugs to intentional implementation quirks. Debugging the exact cause takes time, so I want to ask for your help :D
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Can you please help how you managed to convert this? I am also stuck is there any specific transformers version? |
I have a PR here which aims to further verify equivalence: #24854. The conversion script can be found here and can be run as follows:
The reason I forked LAVIS is to make sure I can compare both implementations using float32. |
System Info
working:
transformers
version: 4.26.1broken:
transformers
version: 4.27.0.dev0Who can help?
@gante @NielsRogge
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
python test_simple.py
, model is correctly loaded and prints a captionpip install --upgrade git+https://github.com/huggingface/transformers
(I wanted the new shiny blip2 conversion script so I can conver my finetuned model into HF format)Resolved https://github.com/huggingface/transformers to commit 8b3db33a763ccef828fca89bac7e6cbff314f131
python test_simple.py
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 25 but got size 5 for tensor number 1 in the list.
Expected behavior
Can use BLIP2 with latest HF
The text was updated successfully, but these errors were encountered: