Running error #23

JonnesLin · 2025-02-28T20:38:29Z

Hi there! Thanks for your effort!

I'm having some errors during running the stage 1:

[rank1]: Traceback (most recent call last):
[rank1]: File "/root/Forward-Forward/llava-train_videochat/llava/train/train_mem.py", line 7, in
[rank1]: train()
[rank1]: File "/root/Forward-Forward/llava-train_videochat/llava/train/train.py", line 2081, in train
[rank1]: trainer.train()
[rank1]: File "/opt/conda/envs/FForward/lib/python3.9/site-packages/transformers/trainer.py", line 1859, in train
[rank1]: return inner_training_loop(
[rank1]: File "/opt/conda/envs/FForward/lib/python3.9/site-packages/transformers/trainer.py", line 2165, in _inner_training_loop
[rank1]: for step, inputs in enumerate(epoch_iterator):
[rank1]: File "/opt/conda/envs/FForward/lib/python3.9/site-packages/accelerate/data_loader.py", line 454, in iter
[rank1]: current_batch = next(dataloader_iter)
[rank1]: File "/opt/conda/envs/FForward/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 630, in next
[rank1]: data = self._next_data()
[rank1]: File "/opt/conda/envs/FForward/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1344, in _next_data
[rank1]: return self._process_data(data)
[rank1]: File "/opt/conda/envs/FForward/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1370, in _process_data
[rank1]: data.reraise()
[rank1]: File "/opt/conda/envs/FForward/lib/python3.9/site-packages/torch/_utils.py", line 706, in reraise
[rank1]: raise exception
[rank1]: ValueError: Caught ValueError in DataLoader worker process 0.
[rank1]: Original Traceback (most recent call last):
[rank1]: File "/opt/conda/envs/FForward/lib/python3.9/site-packages/transformers/feature_extraction_utils.py", line 182, in convert_to_tensors
[rank1]: tensor = as_tensor(value)
[rank1]: File "/opt/conda/envs/FForward/lib/python3.9/site-packages/transformers/feature_extraction_utils.py", line 141, in as_tensor
[rank1]: return torch.tensor(value)
[rank1]: RuntimeError: Could not infer dtype of numpy.float32

[rank1]: During handling of the above exception, another exception occurred:

[rank1]: Traceback (most recent call last):
[rank1]: File "/opt/conda/envs/FForward/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 309, in _worker_loop
[rank1]: data = fetcher.fetch(index) # type: ignore[possibly-undefined]
[rank1]: File "/opt/conda/envs/FForward/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
[rank1]: data = [self.dataset[idx] for idx in possibly_batched_index]
[rank1]: File "/opt/conda/envs/FForward/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 52, in
[rank1]: data = [self.dataset[idx] for idx in possibly_batched_index]
[rank1]: File "/root/Forward-Forward/llava-train_videochat/llava/train/train.py", line 1446, in getitem
[rank1]: raise e
[rank1]: File "/root/Forward-Forward/llava-train_videochat/llava/train/train.py", line 1443, in getitem
[rank1]: sample = self._get_item(i)
[rank1]: File "/root/Forward-Forward/llava-train_videochat/llava/train/train.py", line 1499, in _get_item
[rank1]: raise e
[rank1]: File "/root/Forward-Forward/llava-train_videochat/llava/train/train.py", line 1490, in _get_item
[rank1]: image = processor.preprocess(video, return_tensors="pt")["pixel_values"]
[rank1]: File "/root/Forward-Forward/llava-train_videochat/llava/model/multimodal_encoder/umt_encoder.py", line 66, in preprocess
[rank1]: return BatchFeature(data=data, tensor_type=return_tensors)
[rank1]: File "/opt/conda/envs/FForward/lib/python3.9/site-packages/transformers/feature_extraction_utils.py", line 78, in init
[rank1]: self.convert_to_tensors(tensor_type=tensor_type)
[rank1]: File "/opt/conda/envs/FForward/lib/python3.9/site-packages/transformers/feature_extraction_utils.py", line 188, in convert_to_tensors
[rank1]: raise ValueError(
[rank1]: ValueError: Unable to create tensor, you should probably activate padding with 'padding=True' to have batched tensors with the same length.

The error was raised in umt_encoder.py, line 64: images = reduce(lambda x, f: [*map(f, x)], transforms, images)

When the input is a video ( i think), which is len(images) is greater than 1, the error happened.

It is probably easy to debug BUT you probably didnt have this problem in your machine, right?

And also, I have infinite conflicts when installing packages according to the requirement.txt. And i don't think I do all as what you provide in the requirements.txt.

leexinhao · 2025-03-01T07:09:33Z

You could try pip some import packages like numpy torch .... by yourself rather than pip install -r requirement.txt， could you inference our model in huggingface? If possible, I think the training only needs to modify the a few packages.

leexinhao · 2025-03-01T07:10:51Z

You could try pip some import packages like numpy torch .... by yourself rather than pip install -r requirement.txt， could you inference our model in huggingface? If possible, I think the training only needs to modify the a few packages.

And I think the most possible reason is numpy or transformers, because the error is from processor.preprocess.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running error #23

Running error #23

JonnesLin commented Feb 28, 2025 •

edited

Loading

leexinhao commented Mar 1, 2025

leexinhao commented Mar 1, 2025

Running error #23

Running error #23

Comments

JonnesLin commented Feb 28, 2025 • edited Loading

leexinhao commented Mar 1, 2025

leexinhao commented Mar 1, 2025

JonnesLin commented Feb 28, 2025 •

edited

Loading