-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Install most packages from conda-forge, instead of pypi #2934
Comments
@yuvipanda Hi, just dropping in here! Is there a reason that you can't get the R packages from conda-forge? e.g. |
Basically, for python packages, we should install them with conda via environment.yml if it exists in conda-forge, and use pip otherwise. |
@agoose77 Most of the R community I know of would like to use |
I see what you're saying. I'm not familiar with the R toolchain - is it possible to use different conda environments for RStudio vs the Python kernels? |
@agoose77 most of our R users use R via RStudio, so conda and Jupyter kernels are completely uninvolved there. |
@yuvipanda sure, let me clarify! My understanding of your situation is:
I am wondering whether it makes sense to drop the need for apt packages entirely by installing RStudio itself in a separate environment to Python, and then have your entry-point such that this is invisible to the user. This is just so that there is a clearer isolation / separation between the system and the application environments (RStudio, Python). Functionally this wouldn't be much different from using apt for the R dependencies, except that it keeps everything on Conda. |
Ah, so they're installed in the same Docker image, but R doesn't know anything about conda at all, so they aren't in the same 'conda environment'. The proposal in this issue uses conda for all Python, and R's native package installation (from CRAN) + apt for R. The scripts that R users distribute often have |
Right! My suggestion was only that putting R inside a separate Conda env would allow you to avoid using APT for R, because |
@agoose77 ah, ok - I'll consider that :) I'm somewhat quite reluctant to use conda for R, as I feel the general R community is much more focused on CRAN and apt than on conda. packagemanager.rstudio.com offers prebuilt binary packages for all of CRAN, while there are only a subset of packages available on conda-forge. In my ideal world, I'd not use conda for python packages either (so I don't have to mix them!) - and at least for now it looks like I can do that (avoid mixing!) with R. |
Yeah, I don't like mixing my pip with conda-forge packages (and therefore tend to rely solely on PyPI). It would be nice if there were an abstraction layer for tools like poetry such that PyPI + conda-forge could be used by the tool. |
Submitted PR for julia and asked @yuvipanda to review just to make sure we're on the same page. |
Moving requirements.txt to environment.yml for julia issue #2934
Moving requirements.txt to environment.yml for biology issue #2934
Moving requirements.txt to environment.yml for data8 issue #2934
dlab uses the datahub user image. |
@yuvipanda looking at eecs hub, would https://anaconda.org/conda-forge/py-opencv be the same as opencv-python? |
@felder Is this issue in scope for Fall 22 or should be moved to Spring 23 or is irrelevant at this juncture? |
@balajialg could be in scope for Fall. My understanding is we're also going to update the base image and do some package management. Could be this gets done as part of that work. |
@felder Sounds good. Thanks! |
Currently, we get most of our python packages from pypi.org, installed via pip. A lot of scientific python packages have C extensions, and installing them from pypi has been simple enough thanks to manylinux wheels. However, there are some packages - particularly in the geo sciences - that are a pain in the ass to install this way still.
#2824 is one such case. The cartopy project does not ship manylinux wheels, so we need to install its C dependencies - proj, geos, gdal, etc - from apt. This also has knock-on effect for other packages that depend on proj, like shapely. It does have binary wheels, but because cartopy and shapely must link to the same proj library, it must be built from source too - or you run into problems with #1796.
This becomes even more complicated when we add R to the mix. The sf R package also needs proj, and since we're installing it from packagemanager.rstudio.com, it's linked against the version of proj that is available in apt.
So to recap, the following package managers are involved:
This was a bit of a tenuous situation, but the need to upgrade cartopy for #2824 totally made this unworkable. Cartopy 0.20 needed a newer version of proj than what was available in apt. With #2826, we tried to install a newer version of proj from conda (adding yet another package manager to the mix!), but this required we remove proj installed via apt - as otherwise pip was still trying to link to that, and that doesn't work. And once we removed proj from apt, this broke the R
sf
package, as it required proj from apt!I think the core of the problem is that both pip and R are dependent on apt for some C libraries, and this can conflict. I propose instead that we:
The scientific python ecosystem has a lot of good support for conda, so I think this will also simplify our lives a bit. We'll still be getting some python python packages from pip, but as long we're getting most packages that link against C libraries from conda, I think we're ok.
Let's move these one hub image at a time, starting with the easiest.
environment.yml
If we get similar versions from conda that we get from pip right now, I think this would work out ok. Should also be faster to do builds
The text was updated successfully, but these errors were encountered: