Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

While LORA training: permission denied for '/home/appuser/.cache/huggingface #859

Closed
andrey-lepekhin opened this issue May 27, 2023 · 1 comment

Comments

@andrey-lepekhin
Copy link

While trying to train a LORA, there's permission error:
PermissionError: [Errno 13] Permission denied: '/home/appuser/.cache/huggingface'

Full trace

accelerate launch --num_cpu_threads_per_process=2 "train_network.py" --enable_bucket --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" --train_data_dir="/dataset/results_innop/img" --reg_data_dir="/dataset/results_innop/reg" --resolution=512,512 --output_dir="/dataset/results_innop/model" --logging_dir="/dataset/results_innop/log" --network_alpha="1" --save_model_as=safetensors --network_module=networks.lora --text_encoder_lr=5e-05 --unet_lr=0.0001 --network_dim=128 --output_name="innop_v1" --lr_scheduler_num_cycles="15" --learning_rate="0.0001" --lr_scheduler="cosine" --lr_warmup_steps="2268" --train_batch_size="1" --max_train_steps="22680" --save_every_n_epochs="3" --mixed_precision="fp16" --save_precision="fp16" --cache_latents --optimizer_type="AdamW8bit" --max_data_loader_n_workers="0" --bucket_reso_steps=64 --xformers --bucket_no_upscale
2023-05-27 17:34:51.258871: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-05-27 17:34:52.500315: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-05-27 17:34:53.085678: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
There was a problem when trying to write in your cache folder (/home/appuser/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory.
The following values were not passed to `accelerate launch` and had defaults used instead:
	`--num_processes` was set to a value of `1`
	`--num_machines` was set to a value of `1`
	`--mixed_precision` was set to a value of `'no'`
	`--dynamo_backend` was set to a value of `'no'`
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
2023-05-27 17:35:02.229659: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-05-27 17:35:02.403447: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-05-27 17:35:02.466345: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
There was a problem when trying to write in your cache folder (/home/appuser/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory.
prepare tokenizer


Traceback (most recent call last):
  File "/app/train_network.py", line 783, in <module>
    train(args)
  File "/app/train_network.py", line 78, in train
    tokenizer = train_util.load_tokenizer(args)
  File "/app/library/train_util.py", line 2902, in load_tokenizer
    tokenizer = CLIPTokenizer.from_pretrained(original_path)
  File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 1763, in from_pretrained
    resolved_vocab_files[file_id] = cached_file(
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py", line 409, in cached_file
    resolved_file = hf_hub_download(
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_validators.py", line 120, in _inner_fn
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 1131, in hf_hub_download
    os.makedirs(storage_folder, exist_ok=True)
  File "/usr/lib/python3.10/os.py", line 215, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "/usr/lib/python3.10/os.py", line 215, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "/usr/lib/python3.10/os.py", line 225, in makedirs
    mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '/home/appuser/.cache/huggingface'
@andrey-lepekhin
Copy link
Author

Manually creating .cache/huggingface on host and adding it to docker-compose.yml solves the problem

andrey-lepekhin added a commit to andrey-lepekhin/kohya_ss that referenced this issue May 27, 2023
Add a writable cache directory for huggingface. Fixes bmaltais#859
bmaltais pushed a commit that referenced this issue Oct 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants