Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to use BLIP2 with caption_coco_opt6.7b at HEAD via salesforce-lavis (also HEAD) #21713

Closed
1 of 4 tasks
AstraliteHeart opened this issue Feb 21, 2023 · 15 comments
Closed
1 of 4 tasks

Comments

@AstraliteHeart
Copy link

System Info

working:

  • transformers version: 4.26.1
  • Platform: Linux-6.0.12-x86_64-with-glibc2.10
  • Python version: 3.8.16
  • Huggingface_hub version: 0.12.0
  • PyTorch version (GPU?): 1.13.1+cu117 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: yes
  • Using distributed or parallel set-up in script?: no

broken:

  • transformers version: 4.27.0.dev0
  • Platform: Linux-6.0.12-x86_64-with-glibc2.10
  • Python version: 3.8.16
  • Huggingface_hub version: 0.12.0
  • PyTorch version (GPU?): 1.13.1+cu117 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: yes
  • Using distributed or parallel set-up in script?: no

Who can help?

@gante @NielsRogge

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

  1. Start with clean env setup via https://github.com/salesforce/LAVIS/blob/main/requirements.txt (transformers-4.26.1)
  2. Run python test_simple.py, model is correctly loaded and prints a caption
  3. pip install --upgrade git+https://github.com/huggingface/transformers (I wanted the new shiny blip2 conversion script so I can conver my finetuned model into HF format)
  4. Resolved https://github.com/huggingface/transformers to commit 8b3db33a763ccef828fca89bac7e6cbff314f131
  5. Run python test_simple.py
  6. RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 25 but got size 5 for tensor number 1 in the list.
import torch
from lavis.models import load_model_and_preprocess
import torch
from PIL import Image
import requests

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model, vis_processors, _ = load_model_and_preprocess(name="blip2_opt", model_type="caption_coco_opt6.7b", is_eval=True, device=device)

url = "..."
raw_image = Image.open(requests.get(url, stream=True).raw).convert("RGB")
image = vis_processors["eval"](raw_image).unsqueeze(0).to(device)
data = model.generate({"image": image})
print(data)

Expected behavior

Can use BLIP2 with latest HF

@sgugger
Copy link
Collaborator

sgugger commented Feb 21, 2023

cc @younesbelkada

@gante
Copy link
Member

gante commented Feb 21, 2023

Hey @AstraliteHeart 👋 This issue seems to be a duplicate of #21599, which is fixed.

Can I ask you to try to run your script using transformers main branch, i.e. after installing with pip install --upgrade git+https://github.com/huggingface/transformers.git?

@AstraliteHeart
Copy link
Author

I don't think this is a duplicate, my env is past that fix (see p4 in the original repro steps), I've updated form main to confirm as follows:

  1. pip install --upgrade git+https://github.com/huggingface/transformers.git
  2. Resolved https://github.com/huggingface/transformers.git to commit bb5a2f2fc30985841289207b9f1f7765d8abc4e0
  3. python test_simple.py
  4. RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 25 but got size 5 for tensor number 1 in the list.

@gante
Copy link
Member

gante commented Feb 22, 2023

Thank you for confirming @AstraliteHeart 🤗 I will dig deeper and let you know what I find!

@gante
Copy link
Member

gante commented Feb 22, 2023

After some digging, we can see that the exception is raised as follows:

/home/joao/hf/lib/python3.10/site-packages/lavis/models/blip2_models/modeling_opt.py:703 in      │
│ forward                                                                                          │
│                                                                                                  │
│    700 │   │   │   inputs_embeds = self.embed_tokens(input_ids)                                  │
│    701 │   │                                                                                     │
│    702 │   │   if query_embeds is not None:                                                      │
│ ❱  703 │   │   │   inputs_embeds = torch.cat([query_embeds, inputs_embeds], dim=1)               │
│    704 │   │   │   input_shape = inputs_embeds.size()[:-1]                                       │
│    705 │   │                                                                                     │
│    706 │   │   # embed positions                                                                 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 25 but got size 5 for tensor number 1 in the list.

From the full stack trace, we can conclude that the error arises from an issue in lavis, and not in transformers :) Actually, the root cause for this issue is something that we have addressed on this PR -- lavis has a different implementation, where they have a modified OPT model to handle the image embeddings, where we decided to update .generate() to handle soft-prompting.

@AstraliteHeart This means you have two options:

  1. Update your code to rely on transformers, as opposed to lavis. See here for examples.
  2. Open an issue in lavis, so they can help you with this issue :)

@AstraliteHeart
Copy link
Author

@gante thank you for debugging!

I can confirm that syncing before #21405 (edc1e73) works, I'll open an issue on SF side to warn them about the breakage, unfortunately this brings me to the original issue of trying to use convert_blip_2_original_to_pytorch.py, perhaps you can help me figure out how the BLIP2 models were converted? (I understand, this is irrelevant to most users but only a few brave souls who are finetuning BLIP2 via LAVIS but want to then load it in HF.)

I've tried both pip install git+https://github.com/nielsrogge/LAVIS.git@fix_lavis (mentioned in the script) and lavis from HEAD, but I am getting this trace

$ python ./convert_blip_2_original_to_pytorch.py
Loading original model...
Position interpolate from 16x16 to 26x26
tokenizer facebook/opt-6.7b
Loading checkpoint shards: Done!
Traceback (most recent call last):
  File "./convert_blip_2_original_to_pytorch.py", line 304, in <module>
    convert_blip2_checkpoint(args.model_name, args.pytorch_dump_folder_path, args.push_to_hub)
  File "/.../envs/lavis/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "./convert_blip_2_original_to_pytorch.py", line 216, in convert_blip2_checkpoint
    original_logits = original_logits.logits
AttributeError: 'dict' object has no attribute 'logits' // indeed, this is a dictionary containing only 'loss'

what combination of versions of transformers and lavis was used during conversion?

@NielsRogge
Copy link
Contributor

Hi,

Thanks for converting BLIP2 to HF :) I actually forked the LAVIS repo and made some tweaks to facilitate conversion (I removed a bunch of unnecessary requirements etc). See here.

@AstraliteHeart
Copy link
Author

Hi Niels, thank you for checking this.

I did use your fork (or so I thought, sigh), but I redid everything from scratch while comparing traces with code and, well... turned out I moved my blip2 conversion script to LAVIS git root folder which kept including their model (as it's in the lavis folder) even with your fixed one being installed (so I do apologies).

I can now confirm that with your fork I was able to convert my model with snapshot before #21405 and load it it in 8 bits with latest bitsandbytes keeping VRAM usage at 11.1GB (vs around 18.5GB without).

Do you have any guidance on matching outputs between lavis and hf models? I ran about 50 samples though lavis/hf16/hf8 and while hf16 and hf8 are mostly consistent (good), lavis output is better in all cases. (see anecdotal examples below)

Here is roughly how I load and run all models (https://gist.github.com/AstraliteHeart/4d7ebf834021b8e1c9bc439c1633002c) I tried to make sure all settings and rnd seeds are matching, but perhaps I am missing something?

https://derpicdn.net/img/view/2023/2/23/3051871.png

'caption_lavis': ['scootaloo, apple bloom, and applejack in a group hug scootaloo, apple bloom, and applejack are all smiling white background', 'scootaloo, applebloom, and applejack in a group hug scootaloo and applebloom are jumping applejack is smiling white background', 'scootaloo, apple bloom, and applejack in a group hug scootaloo, apple bloom, and applejack are jumping and smiling white background'],
'caption_hf_16': ['a series of images of sweetie belle, applejack, scootaloo, applebloom, rarity, pinkie pie, twilight sparkle, rarity, twilight sparkle, rarity, rarity, rarity, rarity, rarity, rarity', 'a series of images of sweetie belle, applejack, scootaloo, applebloom, rarity, pinkie pie, twilight sparkle, rarity, twilight sparkle, twilight sparkle, twilight sparkle, twilight sparkle', 'a series of images of sweetie belle, applejack, scootaloo, applebloom, rarity, pinkie pie, twilight sparkle, rarity, twilight sparkle, twilight sparkle, rarity, rarity, rarity, rarity'],
'caption_hf_8': ['a series of images of sweetie belle, applebloom, scootaloo, applejack, rarity, pinkie pie, twilight sparkle, fluttershy, rarity, pinkie pie, twilight sparkle, rarity, pink', 'a series of images of sweetie belle, applebloom, scootaloo, applejack, rarity, pinkie pie, twilight sparkle, fluttershy, rarity, pinkie pie, twilight sparkle, twilight sparkle', 'a series of images of sweetie belle, applebloom, scootaloo, applejack, rarity, pinkie pie, twilight sparkle, fluttershy, rarity, twilight sparkle, twilight sparkle, twilight sparkle']

https://derpicdn.net/img/2017/7/7/1480500/large.png

'caption_lavis': ['alicorn twilight sparkle is laying on her back with a book on her head and a book on her chest she is surrounded by books on the floor and on the walls she has a book on her head and a book on her chest she is', 'alicorn twilight sparkle is laying on her back with a book on her head and a book on her chest she is surrounded by books on the floor and on the walls she is also wearing a book on her head and a book on her chest', 'alicorn twilight sparkle is laying on her back with a book on her head and a book on her chest she is surrounded by books on the floor and on the walls she has a book on her head and a book on her chest she has'],
'caption_hf_16': ['posterior view of twilight sparkle lying on the floor surrounded by a pile of books, surrounded by a pile of books, surrounded by a pile of books, surrounded by a pile of books, surrounded by a pile of books, surrounded by', 'posterior view of twilight sparkle lying on the floor surrounded by a pile of books, surrounded by a pile of books, surrounded by a pile of books, surrounded by a pile of books, surrounded by a pile of books\n', 'posterior view of twilight sparkle lying on the floor surrounded by a pile of books, surrounded by a pile of books, surrounded by books, surrounded by books, surrounded by books, surrounded by books, surrounded by books, surrounded by books'],
'caption_hf_8': ['twilight sparkle is lying on the floor surrounded by a pile of books she is surrounded by a pile of books on the floor she is surrounded by a pile of books on the floor she is surrounded by a pile of books on the floor she','twilight sparkle is lying on the floor surrounded by a pile of books she is surrounded by a pile of books on top of her she is surrounded by a pile of books on top of her she is surrounded by a pile of books on top', 'twilight sparkle is lying on the floor surrounded by a pile of books she is surrounded by a pile of books on the floor she is surrounded by a pile of books on the floor she is surrounded by a pile of books on the floor her']

@NielsRogge
Copy link
Contributor

Thanks for reporting, that should not be the case! I extensively tested the greedy/beam search outputs on original vs my implementation to make sure everything works as expected.

But the generate method has had some updates now so there might be a small issue. However isn't it weird that the first token is already different? cc'ing @gante here

@NielsRogge
Copy link
Contributor

Also I'm not sure you can run both LAVIS and Transformers main branch in the same environment to compare, cause LAVIS relies on an older version of Transformers

@AstraliteHeart
Copy link
Author

Results on top are from transformers https://gist.github.com/AstraliteHeart/4d7ebf834021b8e1c9bc439c1633002c + your fork of lavis.

Some more tests (tldr, latest transformers still do not produce the same output)

Official lavis repo:

['scootaloo, apple bloom, and applejack in a group hug scootaloo, apple bloom, and applejack are all smiling white background', 'scootaloo, applebloom, and applejack in a group hug scootaloo and applebloom are jumping applejack is smiling white background', 'scootaloo, apple bloom, and applejack in a group hug scootaloo, apple bloom, and applejack are jumping and smiling white background']
['alicorn twilight sparkle is laying on her back with a book on her head and a book on her chest she is surrounded by books on the floor and on the walls she has a book on her head and a book on her chest she is', 'alicorn twilight sparkle is laying on her back with a book on her head and a book on her chest she is surrounded by books on the floor and on the walls she is also wearing a book on her head and a book on her chest', 'alicorn twilight sparkle is laying on her back with a book on her head and a book on her chest she is surrounded by books on the floor and on the walls she has a book on her head and a book on her chest she has']

Latest transformers:

'caption_hf_16': [
            'a series of images of sweetie belle, applejack, scootaloo, applebloom, rarity, pinkie pie, twilight sparkle, rarity, twilight sparkle, rarity, rarity, rarity, rarity, rarity, rarity',
            'a series of images of sweetie belle, applejack, scootaloo, applebloom, rarity, pinkie pie, twilight sparkle, rarity, twilight sparkle, twilight sparkle, twilight sparkle, twilight sparkle',
            'a series of images of sweetie belle, applejack, scootaloo, applebloom, rarity, pinkie pie, twilight sparkle, rarity, twilight sparkle, twilight sparkle, rarity, rarity, rarity, rarity'
],
'caption_hf_8': [
            'a series of images of sweetie belle, applebloom, scootaloo, applejack, rarity, pinkie pie, twilight sparkle, fluttershy, rarity, pinkie pie, twilight sparkle, rarity, pink',
            'a series of images of sweetie belle, applebloom, scootaloo, applejack, rarity, pinkie pie, twilight sparkle, fluttershy, rarity, pinkie pie, twilight sparkle, twilight sparkle',
            'a series of images of sweetie belle, applebloom, scootaloo, applejack, rarity, pinkie pie, twilight sparkle, fluttershy, rarity, twilight sparkle, twilight sparkle, twilight sparkle'
]
caption_hf_16': [
            'posterior view of twilight sparkle lying on the floor surrounded by a pile of books, surrounded by a pile of books, surrounded by a pile of books, surrounded by a pile of books, surrounded by a pile of books, surrounded by',
            'posterior view of twilight sparkle lying on the floor surrounded by a pile of books, surrounded by a pile of books, surrounded by a pile of books, surrounded by a pile of books, surrounded by a pile of books\n',
            'posterior view of twilight sparkle lying on the floor surrounded by a pile of books, surrounded by a pile of books, surrounded by books, surrounded by books, surrounded by books, surrounded by books, surrounded by books, surrounded by books'
],
'caption_hf_8': [
            'twilight sparkle is lying on the floor surrounded by a pile of books she is surrounded by a pile of books on the floor she is surrounded by a pile of books on the floor she is surrounded by a pile of books on the floor she',
            'twilight sparkle is lying on the floor surrounded by a pile of books she is surrounded by a pile of books on top of her she is surrounded by a pile of books on top of her she is surrounded by a pile of books on top',
            'twilight sparkle is lying on the floor surrounded by a pile of books she is surrounded by a pile of books on the floor she is surrounded by a pile of books on the floor she is surrounded by a pile of books on the floor her'
]

@gante
Copy link
Member

gante commented Feb 25, 2023

Hey @AstraliteHeart 👋 Differences in generation can be explained by many parts of the stack, from ninja numerical bugs to intentional implementation quirks. Debugging the exact cause takes time, so I want to ask for your help :D

  1. Can you confirm that both lavis and transformers are recent versions? (latest release or newer)
  2. Comparing results with sampling is impossible, as minor changes like the order of operations will produce different results. Have you confirmed that the results are different without sampling? (you can ensure that it is not sampling if you are not setting seeds and you're still getting the same outputs)
  3. (If the answers to the questions above are positive) Can you please share a gist like the one you shared above, except without reliance on local data? It would help me get started 🤗

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot closed this as completed Apr 1, 2023
@TalhaUusuf
Copy link

u have any guidance on matching outputs between lav

Can you please help how you managed to convert this? I am also stuck is there any specific transformers version?

@NielsRogge
Copy link
Contributor

NielsRogge commented Jul 25, 2023

I have a PR here which aims to further verify equivalence: #24854.

The conversion script can be found here and can be run as follows:

pip install -U git+https://github.com/nielsrogge/LAVIS.git@blip2_float32
git clone -b improve_blip2 git+https://github.com/nielsrogge/transformers.git
cd transformers
python src/transformers/models/blip_2/convert_blip_2_original_to_pytorch.py --model_name "blip2-flan-t5-xl"

The reason I forked LAVIS is to make sure I can compare both implementations using float32.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants