-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU. #3
Comments
Hi @diabolo98 , this error lies in the facexlib dependencies, not in inference.py. Please in your
to
|
Thanks, I would've never thought of changing the init file. It fixed the issue, but xtalker still wouldn't work. After it solved the problem above, I had another issue related to intel_extension_for_pytorch. I checked the installation guide and I found out it needed to match the torch version, so I installed the correct version using then I got an error about the CPU not having AVX512 and friends, which i then "fixed?" by changing dtype from bfloat16 to float line 78 of
|
Hi @diabolo98 , I just noticed that you are doing the experiment on colab. I have not ever tested it on colab with cpu only. colab's cpus are limited. I just have a simple look on the cpu of colab and there are only 1 core with hyperthreading, which means the IOMP parallel optimization in xtalker should not be enabled, because it does not have parallel computation on only 1 physical core. Of course, you are also not enabling that yet because I find the However, I think only the BFloat16 optimization can still make some performance improvement (1~2x). As you mentioned, you found that Regarding to the last error |
I completely forgot to add that I tried with Your help was deeply appreciated. Thanks again. |
int8 optimization itself does not depend on intel extension for pytorch here so it will not raise the above errors. But again, I've not tested it yet in colab environment. Will do some experiments at weekend. |
I think colab CPU are too limited to be of any use here. I tried to use tortoise TTS that advertise a huge speed-up with half precision, deep speed and KV cache optimization, but on COLAB it says 4 hours for a few minutes of TTS so no matter how much you'll optimize xtalker it will probably still be unbearably slow. Nonetheless, I have a few questions :
|
Colab is truly limited in CPU because it tends to let you use GPU or TPU to do computing intensive work. My optimization is tested on Xeon Sapphire Rapids (check README), so I fully understand that you think it make no sense to do it on Colab. I do not think the converted int8 model will work on GPU. My int8 optimization is based on https://github.com/intel/neural-compressor, which is mostly tested on Intel Xeon CPUs. I recommend you to try TensorRT if you really want to do int8 optimization on NV GPU. I think Colab is enough for int8 conversion with CPU. However I guess it will be slow. You can share the converted model to HF if you want, but I think you should follow MIT license, which the original SadTalker has. By the way, xtalker is irrelevant to TTS. You should not take the slow speed of any TTS component as the reason that xtalker will also be slow. BTW, I've tested tortoise TTS before. tortoise TTS is, as its name implies, SUPER slow. If you want to pipeline TTS+XTalker, try other TTS alternatives :) |
I usually try each component individually, so it is not a problem of using multiple models at once. I know xtalker is irrelevant to TTS. I gave it as an example because they advertised a huge boost in speed on CPU, and so did bark TTS (or was it just bark, I tried both) but despite that they're still extremely slow when using the COLAB CPU. On the other hand, tortoise TTS recently became relatively fast on colab GPU and became very usable. I hoped the int8 would've improved the speed on GPU, but as you said it wouldn't work. Also, Sadly my knowledge about AI is extremely limited, so I don't think I can use TensorRT to do the int8 conversion for GPU unless it's a drag and drop process that require relatively simple adaptation and a lot of documentation reading. |
No problem, your feedbacks are welcome :) |
Hello, i wanted to try your fork of sadtalker in colab, but I keep getting this error:
--------device----------- cpu Traceback (most recent call last): File "inference.py", line 217, in <module> main(args) File "inference.py", line 43, in main preprocess_model = CropAndExtract(sadtalker_paths, device) File "/content/xtalker/src/utils/preprocess.py", line 49, in __init__ self.propress = Preprocesser(device) File "/content/xtalker/src/utils/croper.py", line 22, in __init__ self.predictor = KeypointExtractor(device) File "/content/xtalker/src/face3d/extract_kp_videos_safe.py", line 28, in __init__ self.detector = init_alignment_model('awing_fan',device=device, model_rootpath=root_path) File "/usr/local/lib/python3.8/dist-packages/facexlib/alignment/__init__.py", line 19, in init_alignment_model model.load_state_dict(torch.load(model_path)['state_dict'], strict=True) File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 713, in load return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args) File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 930, in _legacy_load result = unpickler.load() File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 876, in persistent_load wrap_storage=restore_location(obj, location), File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 175, in default_restore_location result = fn(storage, location) File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 152, in _cuda_deserialize device = validate_cuda_device(location) File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 136, in validate_cuda_device raise RuntimeError('Attempting to deserialize object on a CUDA ' RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
I followed instructions, tried with the int8 branch and the main branch too. I tried making it work for hours but no success with and without GPU. Tried adding map_location=torch.device('cpu') on every instance of torch.load in inference.py.
i changed the installed python version from python 3.10 to 3.8 because it's required by sadtalker. i believe i tried with python 3.10 too but i doubt the problem is with the python version.
Thanks in advance.
The text was updated successfully, but these errors were encountered: