Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--use-assignment-cache error: Error: Git LFS file not pulled successfully. Please install git-lfs #403

Closed
jacaravas opened this issue Apr 4, 2022 · 9 comments · Fixed by #411

Comments

@jacaravas
Copy link

I am having an issue getting the "--use-assignment-cache" switch to run correctly. The full output/error I am getting is:

****
Pangolin running in usher mode.
****
Query file:     /home/XXX/pangolin_test/test_data.fasta
****
Data files found:
usher_pb:       /home/XXX/my_conda_envs/pangolin-4/lib/python3.8/site-packages/pangolin_data/data/lineageTree.pb
****
Error: Git LFS file not pulled successfully. Please install git-lfs
using conda or an alternative (not pip) then re-install pangolin-assignment
with pip install git+https://github.com/cov-lineages/pangolin-assignment.git

I have followed the instructions in the error message. After installing git-lfs, running git lfs install returns Git LFS initialized., so that appears to be working correctly. I installed the pangolin-assignment using git as instructed. I also manually ran pangolin --add-assignment-cache for good measure.

Could you point me in the right direction for troubleshooting this? Is this possibly a certificate, firewall, or proxy issue?

Thanks,
Jason

Version information:

pangolin --all-versions
pangolin: 4.0.1
pangolin-data: 1.2.133
constellations: v0.1.4
scorpio: 0.3.16
pangolin-assignment: v1.2.133

Command line:
pangolin test_data.fasta -o test -t 2 --use-assignment-cache --verbose

@AngieHinrichs
Copy link
Member

Sorry to hear that @jacaravas! Are you running pangolin --use-assignment-cache in a script/workflow or just on the command line? Can you send the output of these commands?:

conda activate pangolin-4
conda list git-lfs
conda list pangolin

@jacaravas
Copy link
Author

Hi @AngieHinrichs,

At the moment, I am just evaluating different options on the command line to see what impact they will have on our system performance. They will be added to a workflow later. --analysis-mode fast and --analysis-mode accurate are both working, but once I try --use-assignment-cache, I get the error described above.

There is no output from conda activate. I am starting the environment by sourcing the conda executable explicitly:

source /apps/miniconda3/bin/activate ~/my_conda_envs/pangolin-4

conda list git-lfs:

# packages in environment at /home/XXX/my_conda_envs/pangolin-4:
#
# Name                    Version                   Build  Channel
git-lfs                   1.6                      pypi_0    pypi

conda list pangolin

# packages in environment at /home/XXX/my_conda_envs/pangolin-4:
#
# Name                    Version                   Build  Channel
pangolin                  4.0.1                    pypi_0    pypi
pangolin-assignment       1.2.133                  pypi_0    pypi
pangolin-data             1.2.133                  pypi_0    pypi

@IsabelFE
Copy link

IsabelFE commented Apr 5, 2022

I am having the same issue. I am also testing the use of --use-assignment-cache.

error:

****
Pangolin running in usher mode.
****
Query file:     /data/cumulative_fasta/220301_nextseq_all.fasta
****
Data files found:
usher_pb:       /workspace/miniconda3/envs/pangolin/lib/python3.8/site-packages/pangolin_data/data/lineageTree.pb
****
Error: Git LFS file not pulled successfully. Please install git-lfs 
using conda or an alternative (not pip) then re-install pangolin-assignment 
with pip install git+https://github.com/cov-lineages/pangolin-assignment.git

conda list git-lfs:

# packages in environment at /workspace/miniconda3/envs/pangolin:
#
# Name                    Version                   Build  Channel
git-lfs                   1.6                      pypi_0    pypi

conda list pangolin

# packages in environment at /workspace/miniconda3/envs/pangolin:
#
# Name                    Version                   Build  Channel
pangolin                  4.0.2                    pypi_0    pypi
pangolin-assignment       1.2.133                  pypi_0    pypi
pangolin-data             1.2.133                  pypi_0    pypi

@AngieHinrichs
Copy link
Member

Thanks! The conda list output shows that git-lfs is installed as expected, but the error message indicates that the installed cache file is just a git-lfs pointer, instead of the actual large file that should be installed in its place. So it seems that something went wrong during the pangolin --add-assignment-cache step.

Next command to try (in the same activated conda environment):

ls -l $CONDA_PREFIX/lib/python3.8/site-packages/pangolin_assignment/usher_assignments.cache.csv.gz 

I see a file size of 174192980 but you'll probably see something much smaller. If so, you can try removing and reinstalling the assignment cache:

rm -r $CONDA_PREFIX/lib/python3.8/site-packages/pangolin_assignment/
pangolin --add-assignment-cache
ls -l $CONDA_PREFIX/lib/python3.8/site-packages/pangolin_assignment/usher_assignments.cache.csv.gz 

Hopefully after running those commands you will see a file size of 174192980 too. (I ran those commands and see 174192980 as before.) If not, then something is definitely going wrong in --add-assignment-cache.

@IsabelFE
Copy link

IsabelFE commented Apr 5, 2022

ls -l $CONDA_PREFIX/lib/python3.8/site-packages/pangolin_assignment/usher_assignments.cache.csv.gz 
-rw-rw-r-- 1 ssm-user ssm-user 134 Apr  5 17:43 /workspace/miniconda3/envs/pangolin/lib/python3.8/site-packages/pangolin_assignment/usher_assignments.cache.csv.gz

I deleted and installed again and got the same size file

@AngieHinrichs
Copy link
Member

Thanks @IsabelFE!

I was able to duplicate the problem in an environment that did not already have git-lfs, and I think the problem is that installing git-lfs with conda is not enough -- git-lfs install must be run (good idea @jacaravas), and it must be run before pangolin --add-assignment-cache. This series of commands fixed the problem in my environment:

git-lfs install
rm -r $CONDA_PREFIX/lib/python3.8/site-packages/pangolin_assignment*
pangolin --add-assignment-cache
ls -l $CONDA_PREFIX/lib/python3.8/site-packages/pangolin_assignment/usher_assignments.cache.csv.gz 

I think the solution (in the next patch release of pangolin) will be for pangolin --add-assignment-cache to run git-lfs install before it installs the pangolin-assignment repository.

@IsabelFE
Copy link

IsabelFE commented Apr 5, 2022

Thanks @AngieHinrichs!!

I followed your instructions and this time I got a bigger file (174192980). I am testing --use-assignment-cache now and I am not getting the same error.

@AngieHinrichs
Copy link
Member

Great, thanks so much @IsabelFE for confirming, and sorry about the inconvenience! I will make sure this is fixed soon.

@jacaravas
Copy link
Author

This resolved my issue. Thanks, @AngieHinrichs!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants