Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instruction for Use on the ParamShivay and other SLURM systems #94

Closed
singhakr opened this issue Aug 16, 2024 · 5 comments
Closed

Instruction for Use on the ParamShivay and other SLURM systems #94

singhakr opened this issue Aug 16, 2024 · 5 comments

Comments

@singhakr
Copy link

Could you please add instructions for installation and usage on the ParamShivay system, which many of the HEIs in India are using for computation? Or, could you just confirm that it will work simply by creating a virtual environment and installing from the script provided? Providing this information might save quite a bit of time, which might otherwise be wasted in trying to use it in a wrong way.

Thanks for the great work! I ask most of my students to use AI4Bharat models, at least to start with.

@singhakr
Copy link
Author

I tried to install using the instructions given, but I get the following error on the Param Shivay system:

`Checking out files: 100% (1619/1619), done.
Processing /home/user.name/IndicTrans2/fairseq
Installing build dependencies ... done
Getting requirements to build wheel ... error
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> [23 lines of output]
Traceback (most recent call last):
File "/home/user.name/anaconda3/envs/itv2/lib/python3.9/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in
main()
File "/home/user.name/anaconda3/envs/itv2/lib/python3.9/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
File "/home/user.name/anaconda3/envs/itv2/lib/python3.9/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
return hook(config_settings)
File "/tmp/pip-build-env-6np9gfm8/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 327, in get_requires_for_build_wheel
return self._get_build_requires(config_settings, requirements=[])
File "/tmp/pip-build-env-6np9gfm8/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 297, in _get_build_requires
self.run_setup()
File "/tmp/pip-build-env-6np9gfm8/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 313, in run_setup
exec(code, locals())
File "", line 12, in
File "/tmp/pip-build-env-6np9gfm8/overlay/lib/python3.9/site-packages/torch/init.py", line 289, in
_load_global_deps()
File "/tmp/pip-build-env-6np9gfm8/overlay/lib/python3.9/site-packages/torch/init.py", line 245, in _load_global_deps
raise err
File "/tmp/pip-build-env-6np9gfm8/overlay/lib/python3.9/site-packages/torch/init.py", line 226, in _load_global_deps
ctypes.CDLL(global_deps_lib_path, mode=ctypes.RTLD_GLOBAL)
File "/home/user.name/anaconda3/envs/itv2/lib/python3.9/ctypes/init.py", line 382, in init
self._handle = _dlopen(self._name, mode)
OSError: /tmp/pip-build-env-6np9gfm8/overlay/lib/python3.9/site-packages/torch/lib/libtorch_global_deps.so: failed to map segment from shared object: Operation not permitted
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1`

@PranjalChitale
Copy link
Collaborator

The install.sh script should work regardless of the cluster you intend to run your experiments on.

Regarding the traceback shared, following issues might be relevant to help debug.
pytorch/pytorch#16558
#85

@singhakr
Copy link
Author

singhakr commented Aug 17, 2024

But I am getting this error while running the install.sh script on Param Shivay, not while doing translation or importing PyTorch.

The installation does not finish correctly and so I am not able to try translation.

The error occurs at the following point during installation:

Processing /home/user.name/IndicTrans2/fairseq

@singhakr
Copy link
Author

Since the link you gave mentions lack of memory as the problem, should I retry installation from a computer node and as a job, rather than the usual kind of installation?

@singhakr
Copy link
Author

I have managed to do it with the HuggingFace interface. I was probably doing something simple in a wrong way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants