Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request python package Qiime2 and RPy2 for class CCB 293! #2737

Closed
balajialg opened this issue Sep 10, 2021 · 84 comments
Closed

Request python package Qiime2 and RPy2 for class CCB 293! #2737

balajialg opened this issue Sep 10, 2021 · 84 comments
Assignees
Labels
package-request Package addition request for a hub support

Comments

@balajialg
Copy link
Contributor

Which package do you want installed? Please provide more information about the package, such as the version you require, missing functionality you need, and a URL to the website for the package.

qiime2: https://docs.qiime2.org/2021.4/interfaces/artifact-api/
DESeq2: which is an R package, but I think there is a way to run it in a Jupyter notebook using the package 'rpy2'?

Are there specific package dependencies?

Which hub do you want it installed on? For example r.datahub, datahub, data100.datahub, etc.
Datahub

Which class will use this package?
CCB 293

Include a link to appropriate entry in https://classes.berkeley.edu/ if available. If not, please mention class name & instructor.

Which semester will this package be used for?
Fall, 2021
This helps us clean up unused packages after a term ends.

Any additional information we should know about?

@balajialg
Copy link
Contributor Author

Teaching team will make edits to this request!

@kferger320
Copy link

Hi here are a few edits to this request:

Which package do you want installed? Please provide more information about the package, such as the version you require, missing functionality you need, and a URL to the website for the package.

qiime2: version: QIIME 2 Core 2021.8 distribution
https://docs.qiime2.org/2021.4/interfaces/artifact-api/

Are there specific package dependencies?

Which hub do you want it installed on? For example r.datahub, datahub, data100.datahub, etc.
biology.datahub.berkeley.edu

Which class will use this package?
CCB 293

Include a link to appropriate entry in https://classes.berkeley.edu/ if available. If not, please mention class name & instructor.
Doctoral Seminar in Computational Biology
Professors Priya Moorjani and Ashley Wolf

Which semester will this package be used for?
Fall, 2021
This helps us clean up unused packages after a term ends.

Any additional information we should know about?

@balajialg balajialg added the package-request Package addition request for a hub label Sep 16, 2021
@felder
Copy link
Contributor

felder commented Sep 16, 2021

@yuvipanda here's a link to the installation instructions. What do you think?

https://docs.qiime2.org/2021.8/install/native/

@felder
Copy link
Contributor

felder commented Sep 16, 2021

@petersudmant giving you a heads up on this too to see if it's something you'd be willing to tackle.

@yuvipanda
Copy link
Contributor

https://raw.githubusercontent.com/qiime2/environment-files/master/2021.8/release/qiime2-2021.8-py38-linux-conda.yml is the environment.yml file it needs, including its own channel. I think this is something we can try on biology hub as we already get many things from conda there. But we'll need to figure out which packages from there we actually need here. I'm not sure how exactly to do that - maybe there's documentation for it somewhere I can't find.

@felder rpy2 should be easy to add tho - I think it's just a python package?

@felder
Copy link
Contributor

felder commented Sep 16, 2021

@balajialg @kferger320 is DESeq2 and rpy2 still part of this request? If so should it still be installed on datahub or should it be installed on biology hub?

@felder
Copy link
Contributor

felder commented Sep 16, 2021

@yuvipanda well it's a bit confusing because there's also DESeq2 which might be runable in a jupyter notebook via rpy2?

@balajialg
Copy link
Contributor Author

balajialg commented Sep 16, 2021

@felder Teaching team is thinking about using other alternatives to the DESeq2 package and will get back with a new package request for the same. Currently, they require qiime2 package to be installed. It should get installed in the biology hub as they will require complex computations.

@petersudmant
Copy link
Contributor

DEseq2 should be super easy - it's just an R package. I've done lots of similar packages. Happy to do that if you'd like. Re qiime2, looks possible. Would prefer someone else to tackle, but, happyt ogive it a shot.

@petersudmant
Copy link
Contributor

Tell them that EBseq is already on there which works really well

@felder
Copy link
Contributor

felder commented Sep 16, 2021

@petersudmant it seems for the time being the DEseq2 is not required.

@yuvipanda maybe this gives us a reasonable list of packages to think about?
https://docs.qiime2.org/2021.8/install/#recommendations

I dunno, they do say that there are a lot of dependencies. Would it make sense to do as the qiime2 folks recommend and install it as a whole new conda environment and perhaps the students will need to activate it?

@felder
Copy link
Contributor

felder commented Sep 16, 2021

Tell them that EBseq is already on there which works really well

@kferger320 @balajialg

@petersudmant
Copy link
Contributor

installing as a "whole new conda env" makes sense to me - I have done this before. You can see deployments/biology/image/ib134-packages.bash - we can make one for CCB293 as well. I think this should be smooth?! Shall I try?

@felder
Copy link
Contributor

felder commented Sep 16, 2021

@petersudmant if you're willing, that'd be awesome!

@petersudmant
Copy link
Contributor

petersudmant commented Sep 17, 2021

OK, done! @balajialg @kferger320 @felder works great! Run w/ the following

>bash
>conda activate qiime2-2021.8

It's building on staging now - i'll double check it works in the a.m. and push it to prod. (or anyone else can feel free to do that if the build looks good)

@kferger320
Copy link

@petersudmant @balajialg @felder thanks so much for getting this done so quickly! I will test it out once everything's pushed

@petersudmant
Copy link
Contributor

it's all uploaded now. I had to do

conda init bash
bash
conda activate qiime2-2021.8

then it works. Works great

@felder
Copy link
Contributor

felder commented Sep 17, 2021

@petersudmant Has this been merged into prod? If you need that done, please let me know or open a PR and make me a reviewer.

@petersudmant
Copy link
Contributor

YEP - should be! Yuvi did this a.m.

@felder
Copy link
Contributor

felder commented Sep 17, 2021

@petersudmant ok thanks!

@felder felder closed this as completed Sep 17, 2021
@balajialg
Copy link
Contributor Author

balajialg commented Sep 27, 2021

@kferger320's request here - I'm trying to test out qiime2 in a Jupyter notebook on biology.datahub.berkeley.edu and I'm having a hard time figuring out how to load it in the notebook since it was installed as a conda environment. We want to be able to use the package in a Jupyter notebook for the class, so could you let me know the best way to get this working?

@petersudmant I tried running the steps you outlined, but I am also having difficulty importing the qiime2 package. Can you check and let us know what we are missing?

Ran this command in the terminal
conda init bash
bash
conda activate qiime2-2021.8

and tried importing the qiime2 package in the jupyter notebook,

import qiime2 as q2

but getting this error

`---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
/tmp/ipykernel_4191/2263730633.py in
----> 1 import qiime2 as q2

ModuleNotFoundError: No module named 'qiime2'`

Any idea what are we missing?
Here is the link to my Datahub instance

@petersudmant
Copy link
Contributor

petersudmant commented Sep 27, 2021

Right - So the commands I wrote out are for access to qiime2 in the command line,

eg:

jovyan@jupyter-psudmant:~$ conda init bash
no change     /opt/conda/condabin/conda
no change     /opt/conda/bin/conda
no change     /opt/conda/bin/conda-env
no change     /opt/conda/bin/activate
no change     /opt/conda/bin/deactivate
no change     /opt/conda/etc/profile.d/conda.sh
no change     /opt/conda/etc/fish/conf.d/conda.fish
no change     /opt/conda/shell/condabin/Conda.psm1
no change     /opt/conda/shell/condabin/conda-hook.ps1
no change     /opt/conda/lib/python3.9/site-packages/xontrib/conda.xsh
no change     /opt/conda/etc/profile.d/conda.csh
no change     /home/jovyan/.bashrc
No action taken.
jovyan@jupyter-psudmant:~$
jovyan@jupyter-psudmant:~$ bash
(base) jovyan@jupyter-psudmant:~$ conda activate qiime2-2021.8
(qiime2-2021.8) jovyan@jupyter-psudmant:~$ qiime
Usage: qiime [OPTIONS] COMMAND [ARGS]...

  QIIME 2 command-line interface (q2cli)
  --------------------------------------

  To get help with QIIME 2, visit https://qiime2.org.

  To enable tab completion in Bash, run the following command or add it to
  your .bashrc/.bash_profile:

      source tab-qiime

  To enable tab completion in ZSH, run the following commands or add them to
  your .zshrc:

      autoload -Uz compinit && compinit
      autoload bashcompinit && bashcompinit
      source tab-qiime

Options:
  --version   Show the version and exit.
  --help      Show this message and exit.

Commands:
  info                Display information about current deployment.
  tools               Tools for working with QIIME 2 files.
  dev                 Utilities for developers and advanced users.
  alignment           Plugin for generating and manipulating alignments.
  composition         Plugin for compositional data analysis.
  cutadapt            Plugin for removing adapter sequences, primers, and
                      other unwanted sequence from sequence data.

  dada2               Plugin for sequence quality control with DADA2.
  deblur              Plugin for sequence quality control with Deblur.
  demux               Plugin for demultiplexing & viewing sequence quality.
  diversity           Plugin for exploring community diversity.
  diversity-lib       Plugin for computing community diversity.
  emperor             Plugin for ordination plotting with Emperor.
  feature-classifier  Plugin for taxonomic classification.
  feature-table       Plugin for working with sample by feature tables.
  fragment-insertion  Plugin for extending phylogenies.
  gneiss              Plugin for building compositional models.
  longitudinal        Plugin for paired sample and time series analyses.
  metadata            Plugin for working with Metadata.
  phylogeny           Plugin for generating and manipulating phylogenies.
  quality-control     Plugin for quality control of feature and sequence data.
  quality-filter      Plugin for PHRED-based filtering and trimming.
  sample-classifier   Plugin for machine learning prediction of sample
                      metadata.

  taxa                Plugin for working with feature taxonomy annotations.
  vsearch             Plugin for clustering and dereplicating with vsearch.
(qiime2-2021.8) jovyan@jupyter-psudmant:~$

Your use case is different - I didn't realize you wanted to code in python. I'm not sure how to do this. The only way I can think is to create a new jupyter-notebook kernel - here are instructions on how this CAN work. Not sure if this would work though. We coudl try

@balajialg
Copy link
Contributor Author

balajialg commented Sep 28, 2021

@petersudmant Thank you so much for clarifying the installation set up! I didn't realize it was for terminal-based usage.

@kferger320 Would you be able to use the terminal commands for qiime2 (as outlined above by @petersudmant )? or having qiime2 as part of jupyter-notebooks is a must?

@petersudmant Is this something that you would be able to help with? I tried replicating the blog's suggestions in my notebook, but I had difficulty running the import statement despite switching to the new conda environment with the qiime package installed. (Followed the first two steps outlined in the blog).

10 conda create --name qiime-env
11 conda activate qiime-env
12 conda install -c qiime2 qiime2
13 ipython kernel install --user --name=qiime-env

@kferger320
Copy link

@balajialg Ideally we would use this package as part of a python tutorial, so it would be best if we could get it working in a Jupyter notebook if possible.

@petersudmant
Copy link
Contributor

@kferger320 @balajialg I have a solution - but, it's a tiny bit tedious. I'm trying something else ( I think it's a good fix ) - but it's compiling on my computer now and I want to go to bed - I'll check in tomorrow ok?

@balajialg
Copy link
Contributor Author

@petersudmant Makes sense! Considering that
i) Classes will happen using the hub during this semester and
ii) Our limited bandwidth,
We can keep the same env setup until the end of the semester and rethink cleaning it up later.

@kferger320 reported an issue using the qiime package sometime back. Sharing the log content for your reference,

/opt/conda/envs/qiime2/lib/python3.8/site-packages/scipy/stats/stats.py:3650: F_onewayConstantInputWarning: Each of the input arraysis constant;the F statistic is not defined or infinite warnings.warn(F_onewayConstantInputWarning()) Traceback (most recent call last): File "/opt/conda/envs/qiime2/lib/python3.8/site-packages/q2cli/commands.py", line 329, in __call__ results = action(**arguments) File "<decorator-gen-531>", line 2, in ancom File "/opt/conda/envs/qiime2/lib/python3.8/site-packages/qiime2/sdk/action.py", line 245, in bound_callable outputs = self._callable_executor_(scope, callable_args, File "/opt/conda/envs/qiime2/lib/python3.8/site-packages/qiime2/sdk/action.py", line 459, in _callable_executor_ viz = qiime2.sdk.Visualization._from_data_dir(temp_dir, File "/opt/conda/envs/qiime2/lib/python3.8/site-packages/qiime2/sdk/result.py", line 381, in _from_data_dir viz._archiver = archive.Archiver.from_data( File "/opt/conda/envs/qiime2/lib/python3.8/site-packages/qiime2/core/archive/archiver.py", line 316, in from_data Format.write(rec, type, format, data_initializer, provenance_capture) File "/opt/conda/envs/qiime2/lib/python3.8/site-packages/qiime2/core/archive/format/v5.py", line 20, in write super().write(archive_record, type, format, data_initializer, File "/opt/conda/envs/qiime2/lib/python3.8/site-packages/qiime2/core/archive/format/v1.py", line 25, in write provenance_capture.finalize( File "/opt/conda/envs/qiime2/lib/python3.8/site-packages/qiime2/core/archive/provenance.py", line 319, in finalize self.write_action_yaml() File "/opt/conda/envs/qiime2/lib/python3.8/site-packages/qiime2/core/archive/provenance.py", line 296, in write_action_yaml fh.write(yaml.dump({'execution': self.make_execution_section()}, File "/opt/conda/envs/qiime2/lib/python3.8/site-packages/qiime2/core/archive/provenance.py", line 262, in make_execution_section runtime['start'] = start = _ts_to_date(self.start) File "/opt/conda/envs/qiime2/lib/python3.8/site-packages/qiime2/core/archive/provenance.py", line 33, in _ts_to_date time_zone = tzlocal.get_localzone() File "/opt/conda/envs/qiime2/lib/python3.8/site-packages/tzlocal/unix.py", line 165, in get_localzone _cache_tz = _get_localzone() File "/opt/conda/envs/qiime2/lib/python3.8/site-packages/tzlocal/unix.py", line 86, in _get_localzone tz = pytz.timezone(etctz.replace(' ', '_')) File "/opt/conda/envs/qiime2/lib/python3.8/site-packages/pytz/__init__.py", line 188, in timezone raise UnknownTimeZoneError(zone) pytz.exceptions.UnknownTimeZoneError: '/UTC'

Based on my google search, I found that this could be a suitable solution for this error. Assuming it is correct, I created this PR. Can you let me know if both my diagnosis and the solution for this issue are correct?

@petersudmant
Copy link
Contributor

petersudmant commented Oct 8, 2021

Hmm - I mean, the question is DOES this fix the issue? @balajialg If so, GREAT!

@balajialg
Copy link
Contributor Author

balajialg commented Oct 8, 2021

@petersudmant Great question! I am not sure. I assumed this might be a fix as few other folks who ran into a similar timezone-related issue installed this library. What would be a good way to test whether this solution works without messing with the docker environment considering that sudo apt install command is not available.

We tried doing pip install tzdata in the notebook but that did not seem to fix the error. I couldn't find any other solution that may be relevant here. @kferger320 was open to testing whether this fix worked in the staging environment. What do you think?

@petersudmant
Copy link
Contributor

@balajialg I think that's a reasonable approach! I test on my laptop, but, this seems like a good way too!

@balajialg
Copy link
Contributor Author

balajialg commented Oct 9, 2021

@petersudmant sounds great! I am merging the changes to staging.

@kferger320 Please do test this and let us know whether this solution actually works!

balajialg added a commit that referenced this issue Oct 9, 2021
Installing library tzdata as part of docker image in biology hub for issue #2737
@kferger320
Copy link

@petersudmant @balajialg I think this solved it! I was able to get all of our commands to run on staging now without error. Thanks to you both for all of your help, we really appreciate it!!

@petersudmant
Copy link
Contributor

Great! good idea @balajialg !

@balajialg
Copy link
Contributor Author

Thanks, @kferger320! That's great to know. FYI, Changes from staging got merged in biology.datahub.berkeley.edu.

@kferger320
Copy link

Wonderful! Just tested it again and it works great. Thanks again everyone for all of your help with this!

@kferger320
Copy link

Hi all @yuvipanda @balajialg @petersudmant @arwolf0808 , it seems the same error that seemed to be solved by installing tzdata is popping up again, so now the same commands that were working great last week are now suddenly not working, and throwing the same pytz.exceptions.UnknownTimeZoneError: '/UTC' error. Maybe something got moved around? This is strange since everything was working great on Thursday in class. It would be great to have this resolved ASAP, as students need to be able to run these commands for their homework.

@balajialg
Copy link
Contributor Author

@kferger320 Sorry to hear that this issue is recurring (which is unfortunate). Were you able to run this command yesterday? We made changes to the biology hub today and trying to see whether that correlates with this behavior.

@yuvipanda @petersudmant Is this issue related to this change which seems to be the only change after the addition of tzdata in the docker file.

@kferger320
Copy link

@balajialg thanks for looking into this. I think both Ashley and I only tested it today, so I can't say if it was working yesterday- we just know that it was working fine a few days ago like I mentioned

@yuvipanda
Copy link
Contributor

@balajialg The tzdata package is already installed by default in all our images - any installation of python installs it by default, as it is listed as a dependency in https://packages.ubuntu.com/focal/libpython3.9-stdlib. So I don't think the explicit install had any effect unfortunately on the situation. Needs deeper investigation.

@yuvipanda yuvipanda reopened this Oct 20, 2021
felder added a commit that referenced this issue Oct 21, 2021
Trying to set timezone to UTC instead of /UTC for #2737
felder added a commit that referenced this issue Oct 21, 2021
3rd attempt to fix timezone for #2737
@felder
Copy link
Contributor

felder commented Oct 21, 2021

The issue here is the machine’s timezone was somehow set to “/UTC” as opposed to “UTC” and pytz did not like that.

I’m deploying a fix now to prod that sets the timezone correctly. The error went away in our test notebook when I deployed the fix to staging.

@petersudmant In a related question, is UTC the timezone you want for biology hub vs America/Los_Angeles ?

@petersudmant
Copy link
Contributor

@felder GOOD CATCH! Sure, that's the time zone we are in why not!

@arwolf0808 @kferger320 why in the WORLD does qiime need to know the time >.< 🥇

@yuvipanda
Copy link
Contributor

@felder great, ty! I think we set all our other hubs to AMerica/Los_Angeles, so let's do that for this one too?

@arwolf0808
Copy link

arwolf0808 commented Oct 21, 2021 via email

@felder
Copy link
Contributor

felder commented Oct 21, 2021

@arwolf0808 the fix for the UTC timezone was deployed to staging and prod last night.

@balajialg
Copy link
Contributor Author

balajialg commented Oct 21, 2021

@felder Thank you so much for fixing this issue! Appreciate it.

@arwolf0808 I hope you are able to run the code. Here is a snapshot from @kferger320's instance in https://biology.datahub.berkeley.edu/ for your reference
image

@felder
Copy link
Contributor

felder commented Oct 21, 2021

@kferger320 can you confirm that this is working correctly now? If so I'll close the issue.

@kferger320
Copy link

@felder @balajialg Yes just tried a few commands and looks like everything's working normally again! Thanks again all for your help with this.

@yuvipanda
Copy link
Contributor

Thanks a lot for fixing it up, @felder!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
package-request Package addition request for a hub support
Projects
None yet
Development

No branches or pull requests

6 participants