Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU Support #7

Closed
hmaarrfk opened this issue Jul 10, 2019 · 80 comments
Closed

GPU Support #7

hmaarrfk opened this issue Jul 10, 2019 · 80 comments

Comments

@hmaarrfk
Copy link
Contributor

@jjhelmus it seems you were able to build pytorch GPU without needing to have variants

https://anaconda.org/anaconda/pytorch/files?version=1.0.1

Is that true?

If so, what challenges do you see moving this work to conda-forge?

@jakirkham
Copy link
Member

Probably a few things. Here are some thoughts based on things we have been working on to get GPU packages to build.

  1. Using different Docker images.
  2. Requiring the nvcc compiler. Here's an example.
  3. Tying the nvcc compiler version to the Docker image.
  4. General reworking of conda-smithy, staged-recipes, and other infrastructure to handle this.

The next thing we need to figure out is how to test the packages. There has been some good discussion and some investigation into possible options. Still more to do here though.

@jakirkham
Copy link
Member

@soumith, do you have any thoughts on this? 🙂

@soumith
Copy link

soumith commented Jul 11, 2019

@jakirkham 's plan sounds about right. The PyTorch official conda binaries in the pytorch channel have been built the same way and the scripts are at https://github.com/pytorch/builder/tree/master/conda

@jjhelmus
Copy link

jjhelmus commented Jul 11, 2019

The recipes used to build the pytorch packages in the defaults channel can be found in the pytorch-feedstock directory of the aggregate repository. These are built using the conda provided compilers but need nvcc and the CUDA runtime library for testing which are provided by the appropriate docker images.

My understanding is that PyTorch does dynamic loading of the CUDA libraries and therefore the package build with GPU support will work on system without a GPU. A CPU only variant would still be a nice addition since the GPU variant is a large download and requires the cudatoolkit and cudnn packages which are also quite large.

@jakirkham
Copy link
Member

I think we are now in a good place to try building a GPU enabled pytorch package in conda-forge. Happy to give this a go if that sounds reasonable. 🙂

@hmaarrfk
Copy link
Contributor Author

hmaarrfk commented Sep 30, 2019 via email

@soumith
Copy link

soumith commented Sep 30, 2019

@jakirkham why would you like gpu-built pytorch in conda-forge? We already provide high-quality packages in the pytorch channel and I am really worried about support. Like, whenever a new pytorch version releases, the conda-forge one will be a bit behind and then there will be all kinds of conflicts. I'm putting into context the conversation on torchvision conda-forge repo that happened yesterday.

@hmaarrfk
Copy link
Contributor Author

@soumith, thanks for keeping up with the conversation. XREF: conda-forge/torchvision-feedstock#2

Adding pytorch to conda-forge has two advantages:

  1. It would allow developers to have packages that explicitly depend on pytorch.
  2. In theory, it would help streamline the installation of multiple different packages beyond those that exist in the default channel. The default channel has the basics, but for many things, I find it lacking. Pointing people to conda-forge (or pip) for software is something I find myself doing from time to time.
  3. It allows the use of a consistent set of compilers, which avoids ABI incompatibility.

I agree that uploading the torchvision package was likely a mistake before the pytorch package was in place. For users that depend on torchvision, I think the 0.2 package version is correct. That said, pytorch moves so quickly, I that users need to be mindful of what version they install. I think the particular user would likely benefit from having the dependency - torchvision >=0.4 in their spec fixing the current incompatibility.

As for being behind on the builds, part of that is that the conda-forge infrastructure isn't setup to automatically detect new tags on github. The tar balls uploaded/generated by your team to github do not contain the 3rd party libraries. In order to build everything, I had to use the git repo. I think I can add the tar ball to trick the updater into automatically rebuilding, but at this point, the azure machines just ran out of RAM.....

I might have to inspire mysrlf from your work: https://github.com/pytorch/builder/tree/master/conda
to find a solution. Honestly, help in building the package the right way would be appreciated, but I understand if you find that hard to justify.

Finally, the last advantage is that conda-forge is also looking beyond x86 architectures, and using the conda-forge platform would enable a pathway toward arm/ppc builds (though they are blocked on the graphics stack for now).

@kkraus14
Copy link

@jakirkham why would you like gpu-built pytorch in conda-forge? We already provide high-quality packages in the pytorch channel and I am really worried about support. Like, whenever a new pytorch version releases, the conda-forge one will be a bit behind and then there will be all kinds of conflicts. I'm putting into context the conversation on torchvision conda-forge repo that happened yesterday.

From the perspective of another maintainer of GPU packages that reside in a different channel than conda-forge, it makes dependency management and end user experience much nicer / easier when they have a one stop shop to get their packages. From what I've seen users typically don't modify their .condarc file, and just add channels to individual install commands, and then things get unexpectedly downgraded / upgraded and the end user has a bad time.

@hmaarrfk
Copy link
Contributor Author

hmaarrfk commented Oct 1, 2019 via email

@rgommers
Copy link

rgommers commented Oct 1, 2019

This issue and the related discussion in conda-forge/torchvision-feedstock#2 do point out some real issues with the conda/conda-forge model. Questions like:

  • Should project maintainers maintain their own conda-forge feedstocks, publish to their own channel, or both?
  • Why aren't projects maintaining their own conda-forge feedstocks (or any conda packages for that matter)? Should we want that and ask them to?
  • Does everything need to be in conda-forge? If so, what's the point of channels? If not, why can't we have cross-channel dependencies?
  • Is "everything in conda-forge" even scalable (the last 1-1.5 years suggest not)?

Note, I know this is probably not the optimal place to discuss this, and neither is Twitter (Cc @mrocklin and @jph00). But what is?

I honestly don't know the answers to any of these questions, and that's pretty telling given that I've been involved in packaging for a long time and am a maintainer on the NumPy and SciPy feedstocks. I've just scanned through the conda-forge docs again, and it doesn't provide answers.

Adding pytorch to conda-forge has two advantages:

  1. It would allow developers to have packages that explicitly depend on pytorch.

This is a rule that's more social than a hard technical requirement.

  1. In theory, it would help streamline the installation of multiple different packages beyond those that exist in the default channel. The default channel has the basics, but for many things, I find it lacking. Pointing people to conda-forge (or pip) for software is something I find myself doing from time to time.

Again social. The pytorch channel has a well-maintained and complete set of packages that could be relied on.

  1. It allows the use of a consistent set of compilers, which avoids ABI incompatibility.

This is true, maybe, sometimes. Package maintainers, with a few notable exceptions (Arrow, Tensorflow 1.x EDIT: not even true anymore for Arrow it looks like, 0.14.1 has binary wheels. Tensorflow 2.0 also has compliant manylinux wheels now), do make this work on PyPI so there's no real reason it couldn't be made to work cross-channel within conda given the right set of conventions/specs/tools.

@h-vetinari
Copy link
Member

h-vetinari commented Oct 1, 2019

Mixing channels has not worked out so well historically, which is why we now have --strict-channel-priority and so many packages migrating to conda-forge (which obviously has many more other reasons too).

It would be interesting to see how some "cross-compatible" channels would look like (or how that could ever be enforced in a way that gets the blessing of conda/conda-forge), but while it is mostly a social convention (as @rgommers mentions), there is a big impact of channels for corporate environments, where the rest of the internet is usually behind a proxy. Getting anything other than the main channels + conda-forge past IT / sysadmins / etc. is a hassle, both procedurally and technically, so every channel has a substantial incremental cost, while conda-forge just works (after the initial setup).

@rgommers: Note, I know this is probably not the optimal place to discuss this, and neither is Twitter (Cc @mrocklin and @jph00). But what is?

CC @conda-forge/core @mingwandroid

Edit: Probably best to start (at least) at Ralf's post.

@isuruf
Copy link
Member

isuruf commented Oct 1, 2019

Package maintainers, with a few notable exceptions (Arrow, Tensorflow 1.x), do make this work on PyPI so there's no real reason it couldn't be made to work cross-channel within conda given the right set of conventions/specs/tools.

pip doesn't respect version constraints of already installed packages which makes it easy to break environments.

Who sets these right set of conventions? Even defaults and conda-forge can't agree on conventions.
For example, in conda-forge, we provide different BLAS implementations and users can decide at install time, but this is not the case with pytorch which requires MKL.

Should project maintainers maintain their own conda-forge feedstocks, publish to their own channel, or both?

It's up to the maintainers.

Why aren't projects maintaining their own conda-forge feedstocks (or any conda packages for that matter)? Should we want that and ask them to?

There are some people who do maintain their feedstocks outside of conda-forge. conda-smithy supports creating feedstocks and uploading to a custom channel.

Does everything need to be in conda-forge? If so, what's the point of channels? If not, why can't we have cross-channel dependencies?

No. Packages can be in other channels, but cross-channel dependencies means we lose control. conda-forge's community does a lot of work to keep the ABI compatibility and the ability to create consistent environments. @conda-forge/core is called for help very frequently in merging some PR in a feedstock that has been abandoned.

@rgommers
Copy link

rgommers commented Oct 1, 2019

pip doesn't respect version constraints of already installed packages which makes it easy to break environments.

that's not really relevant here (and will be solved at some point)

Who sets these right set of conventions? Even defaults and conda-forge can't agree on conventions.

This will need solving one way or another I think.

For example, in conda-forge, we provide different BLAS implementations and users can decide at install time, but this is not the case with pytorch which requires MKL.

PyTorch relies on MKL features beyond plain BLAS, like fft related functionality. So the BLAS-switching isn't relevant here. A conda-forge PyTorch package simply must depend on MKL directly.

Should project maintainers maintain their own conda-forge feedstocks, publish to their own channel, or both?

It's up to the maintainers.

That's not a real answer. The point is: project maintainers may very well be willing to help, but it's not clear how. The conda-forge team/community needs to have a clear vision here. Right now the norm for releasing any project is: release on PyPI as wheels and sdist, then let someone else worry about conda-forge (that someone else could be an individual project maintainer, or a user, or a conda-forge/core member - but it's not the project release manager normally). The norm could change to having releases to conda-forge, or to a custom conda channel, be part of the official project release procedure.

There are some people who do maintain their feedstocks outside of conda-forge. conda-smithy supports creating feedstocks and uploading to a custom channel.

This sounds like there could be part of an answer in here ....

No. Packages can be in other channels, but cross-channel dependencies means we lose control.

That's missing the point - this pytorch-cpu-feedstock you have "control" over, but for users it is unhelpful that this even exists, and from a maintainer point of view why would anyone want to spend double effort to maintain PyTorch builds in two channels (conda-forge and pytorch)?

@isuruf
Copy link
Member

isuruf commented Oct 1, 2019

The point is: project maintainers may very well be willing to help, but it's not clear how.

and from a maintainer point of view why would anyone want to spend double effort to maintain PyTorch builds in two channels

I'm not sure I understand. What do the maintainers want to do? Custom channel or in conda-forge?

@h-vetinari
Copy link
Member

h-vetinari commented Oct 1, 2019

I'm not sure I understand. What do the maintainers want to do? Custom channel or in conda-forge?

Probably best to start (at least) at Ralf's post. I've edited my post accordingly.

@isuruf
Copy link
Member

isuruf commented Oct 1, 2019

PyTorch relies on MKL features beyond plain BLAS, like fft related functionality. So the BLAS-switching isn't relevant here. A conda-forge PyTorch package simply must depend on MKL directly.

Okay. How about the fact that pytorch conda package in pytorch channel require GLIBC 2.17 (centos7) and conda-forge uses GLIBC 2.12 (centos6).

Everything boils down to the set of conventions used. Would other channels like pytorch be open to using the set of conventions that conda-forge use?

@pearu
Copy link

pearu commented Oct 1, 2019

The fact that conda-forge uses GLIBC 2.12 (centos6) is a show stopper for recipes that want to install the latest cudatoolkit (10.1 currently) that installer requires GLIBC 2.14 or newer. For instance, the PR conda-forge/cudatoolkit-dev-feedstock#16 is held back because of this and I would actually hope that conda-forge would start to use a newer GLIBC rather than other channels using the old GLIBC.

@isuruf
Copy link
Member

isuruf commented Oct 1, 2019

@pearu, can you open a different issue for increasing GLIBC version?

(For others who are curious, latest cudatoolkit does not require GLIBC 2.14 as mentioned in the documentation at https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html. Linked PR was just using a wrong URL.)

(Off-topic: there are people interested in making manylinux2010 (centos6) wheels from conda packages and increasing the GLIBC version would stop that.)

@bgruening
Copy link

Hi all,

thanks for this discussion. Let me try to provide my experience as someone that contributes since 4 years to the conda ecosystem and has migrated (with a lot of help) more than 1000 packages to conda-forge.

Should project maintainers maintain their own conda-forge feedstocks, publish to their own channel, or both?

On conda-forge or a community (and here I mean broader scientific fields and not project-wide-communities) channel. I would always recommend conda-forge only for multiple reasons, but the most obvious one is that you want your users to use as less channels as possible. It is never good to have too many channels/ppa/... activated.

Why aren't projects maintaining their own conda-forge feedstocks (or any conda packages for that matter)? Should we want that and ask them to?

They do! I know a lot of upstream maintainers that do that and I would recommend that always. What is happening on staged-recipes is that upstream maintainers are very often pinged and ask if they want to co-maintain the feedstock. Imho this works very well.

Does everything need to be in conda-forge?

Everything that is stable, yes, why not. Keep in mind to keep the channels as low as possible.
There are other ways, like BIoconda is doing, I will get to this later.

If so, what's the point of channels?

For unstable stuff, for training, for beta-version, for people that don't like to play with communities or don't recognize the complexity of integration. It's a matter of providing freedom and trying different community models (bioconda vs. conda-forge etc.). It's good to have this choice. As a matter of fact the conda-forge model (even if not perfect) is the yet most scalable approach that we have seen. And I'm a Bioconda core member that is speaking here.

Please keep in mind that multiple channels are always more complicated to maintain as just one. An other example is name-space clashes. You have this way better under control in one channel than in 10 channels.

If not, why can't we have cross-channel dependencies?

We can. But we need to play together. Bioconda, a channel with 7000 Bioinformatic packages, is depending on conda-forge. We recommend the channel order conda-forge > biodonda. We do have linters in conda-forge and bioconda that prevent name-clashes. Bioconda members are part of conda-forge, we do agree on the same glibc version, we sync the pinnings - we (Bioconda) essentially following conda-forge and discuss how conda-forge evolves with them together as we depend on them.
No redundancy, high quality packages and everyone is happy!

But, there needs to be the will to work together and invest time and effort to make this happen.

Is "everything in conda-forge" even scalable (the last 1-1.5 years suggest not)?

Not sure what you are referring to, but as someone that runs >1000 environments since 4 years on all kind of infrastructure from HPC to Cloud and as a maintainer of BioContainers (building containers out of Conda packages) ... conda is scalable. The most scalable package manager that I have seen so far ... yes even more scalable than dpkg and such. But this also means it's way more complex.

If you acknowlegde that and if you have seen how this community can maintain >10 languages, > 1000 R packages (that are rebuild all 6 month), that we have rebuild everything against new compilers, that if boost or any other library (as zlib) gets a new version, all dependent packages are rebuild (also in Bioconda) than you would probably also say that conda-forge is scalable :)

@pearu;

The fact that conda-forge uses GLIBC 2.12 (centos6) is a show stopper for recipes that want to install the latest cudatoolkit (10.1 currently) that installer requires GLIBC 2.14 or newer. For instance, the PR conda-forge/cudatoolkit-dev-feedstock#16 is held back because of this and I would actually hope that conda-forge would start to use a newer GLIBC rather than other channels using the old GLIBC.

That is exacly what @isuruf was trying to say. It is some kind of agreement that many hundret of people have taken and there are valid reasons for sticking to GLIBC 2.12. For example that all HPC environments I know are running CentOS6 (or similar systems) and this will not go away soon. Gosh I have seen HPC with CentOS5 a year ago :(
So if you are proposing to just update GLIBC you are breaking the workflow and the accessibilty have many thousands of users. And the reason is some properitrary binary blob? However, if you like let's start this discussion, get peoples opinion and ask the scientific community.

In the end my key-points are:

  • Building a scientific package stack is freaky hard and the integration part is very often just underestimated by many people.
  • Conda is bridging programming languages that makes it even harder, it tries and promises to mix arbritrary environments. So if we want to hold this promise we need to integrate more with each other and not less.
  • conda-forge is scalable
  • If you want your own channel mirror the Bioconda solution. That is depend on conda-forge, integrate in the conda-forge linting so name-clashes are avoided and you get pinged if something relevant happening and join the conda-forge meetings.
  • If you don't want to do this, I think you also need to accept that at some point someone wants your package in conda-forge (for various reasons, but also to reduce the incompatible activated channels) and we can not hinder them to do so.

Thanks again for starting this discussion, happy to answer any question also in regard to Bioconda and how we maintain a separate channel but stay compatible with conda-forge ❤️

@rgommers
Copy link

rgommers commented Oct 1, 2019

Everything boils down to the set of conventions used.

It's important, but not the right place to start. Needed conventions follow from goals/requirements that affect users/projects/maintainers/redistributors/etc. So I'll start there.

As a user, I want to be able to use conda to install the latest, feature-complete version of any package I need. I don't really care too much how, I just care that it's robust. So typing something like conda install numpy torch # should get me numpy 1.17.2, pytorch 1.2.0; append -c somechannel if needed. What I don't want is pick up outdated or broken versions. I can learn that defaults may be slightly behind or not contain some packages, so then I go to conda-forge or make that my default for everything for example. But if python install torch -c conda-forge works, I get unhappy if I have to find out the hard way that that got me PyTorch 1.1 without GPU support.

As a maintainer, I want to make my package available to all conda and pip users as soon as possible when I tag a new release. With the least work possible, following some standard procedure each release. Again, I don't really care too much how, happy to take guidance there; among the options:

  1. release sdist and wheels to PyPI, let other packaging teams (Debian, Homebrew, conda-forge, etc.) take it from there
  2. release to PyPI and conda-forge in parallel myself
  3. release to PyPI and my own conda channel in parallel myself; let conda-forge sync with my channel somehow
  4. release to one conda channel only, and let conda-press produce wheels from that that I then upload to PyPI (long-term even nicer, only one build toolchain)

Right now, NumPy, SciPy and the majority of packages do (1). PyTorch does the first part of (3), but conda-forge doesn't "sync" correctly.

As a ecosystem-wide contributor, I want to be able to tell users how to easily install and use large parts of the NumPy/PyData ecosystem. Ideally this is something like "download [Ana|Mini]conda, open your IDE of choice and work through these tutorials" followed by "if something is not in the defaults channel, do X". This is harder today than it was 3 years ago ...

Would other channels like pytorch be open to using the set of conventions that conda-forge use?

Note that I'm not a PyTorch maintainer (although I am contributing) so I won't try to answer that for @soumith. I believe this problem isn't really PyTorch-specific though.

Some other thoughts:

  • As a maintainer of NumPy, SciPy and PyWavelets, I'd be happy to advocate for having each project own and support a conda package "officially", so it gets treated on-par with PyPI/wheels. I can only do that if the model (1-4 above) is clear though.
  • Socially, how does the conda(-forge) community want to be seen and interacted with by projects and maintainers? Like PyPI/pip/wheels, or like Debian?
  • Re I'm not sure I understand. What do the maintainers want to do? Custom channel or in conda-forge?: happy to take the lead from the experts here. Ideally both are feasible, opinions between projects may differ.
  • There are more issues than (1-4 above) with channels. I've brought some up before and @msarahan told me "you are not factoring channels into your thinking enough", but I've never seen a real answer to how channels are supposed to work and how the whole "conda design" fits together. Practical advice has changed over the years (with e.g. --strict-channel-priority and putting conda-forge first in .condarc yes/no). But is there a holistic design or long-term vision that the conda and conda-forge teams share?
  • Packaging is hard, and there's limited energy/expertise. I want to package things once (like (4) above), and at most twice. With PyTorch it's done four times now: the PyTorch team does PyPI and their conda channel, then there's this feedstock, and pytorch in defaults. Also affects other hard-to-build projects - e.g. we still don't have a conda-forge SciPy package for Windows .....

I'll close with echoing @jph00's sentiment on Twitter: anyway, I'll close by saying I really don't like bringing up negative issues like this, because it's stressful and tiring. I only do so about things I really care about and want to be successful. Like conda.

@rgommers
Copy link

rgommers commented Oct 1, 2019

Thanks for the insights @bgruening

We can. But we need to play together. Bioconda, a channel with 7000 Bioinformatic packages, is depending on conda-forge. We recommend the channel order conda-forge > biodonda.

You can only when it's one-way right? I mean, nothing in conda-forge could depend on the pytorch channel or any other channel than defaults? If that could be make bi-directional, then this feedstock could disappear, that would be very helpful.

@rgommers
Copy link

rgommers commented Oct 1, 2019

all dependent packages are rebuild (also in Bioconda) than you would probably also say that conda-forge is scalable :)

I meant specifically that the conda resolver speed problems depend on the size of the graph. It's still quite easy to run into this even with the nice improvements in conda 4.7.x. So if everything needs to be in conda-forge and the number of packages becomes of the same order as that on PyPI, that may not work. So from that perspective, "everything in conda-forge" seems quite unhealthy. Having channels interact well, like bioconda and conda-forge apparently do, may be much better.

@jakirkham
Copy link
Member

Thanks everyone that jumped into this discussion and shared your thoughts. I think this has been extremely valuable. The next step would be to raise some well-scoped issues on the webpage repo for further discussion and resolution. @rgommers, are you happy to do this? 🙂

@jakirkham
Copy link
Member

To your question @soumith (though others have offered some great answers too! 😄)

@jakirkham why would you like gpu-built pytorch in conda-forge? We already provide high-quality packages in the pytorch channel and I am really worried about support. Like, whenever a new pytorch version releases, the conda-forge one will be a bit behind and then there will be all kinds of conflicts. I'm putting into context the conversation on torchvision conda-forge repo that happened yesterday.

Sorry I read your previous comment as stating this plan was ok. Was this not what you meant? Or have you changed your mind?

In either case, it seems that various people have pushed to add the pytorch stack to conda-forge. Now we actually have several downstream packages in conda-forge that require pytorch. However what I'm hearing is the user experience is not very good due to the lack of GPU support. Removal would complicate the story for downstream packages that need pytorch. So it seems like the best course of action would be to make sure we have a fully featured pytorch package in conda-forge.

As to maintenance effort, I suspect (though maybe @jjhelmus can comment 😉) that defaults will try to rebase their current work on top of a pytorch package in conda-forge, which will make it easier for our communities to work together and improve the defaults and conda-forge ecosystems collectively.

If you have particular thoughts on how a conda-forge pytorch package can be built, we would appreciate hearing them and would happily incorporate this feedback. In turn if you'd like to continue doing your own build, you can use the recipe we work on together. Alternatively you could reuse the binaries we produce (after whatever validation seems appropriate to you) or encourage users to get the package from conda-forge. In any event, I'd hope you could benefit from this shared effort.

Thoughts? 🙂

@rgommers
Copy link

rgommers commented Oct 1, 2019

Thanks everyone that jumped into this discussion and shared your thoughts. I think this has been extremely valuable. The next step would be to raise some well-scoped issues on the webpage repo for further discussion and resolution. @rgommers, are you happy to do this? 🙂

Thanks @jakirkham, yes I'll do my best to break this up and create actionable issues. It may take me a little while ....

@hmaarrfk
Copy link
Contributor Author

hmaarrfk commented Oct 1, 2019

Is anybody today depending on pytorch from conda-forge? This package is called pytorch-cpu explicitly to give us time to experiment compiling such a large package without giving users or maintainers the false sense that they are installing a GPU compatible package.

@msarahan
Copy link
Member

msarahan commented Oct 1, 2019

  1. release sdist and wheels to PyPI, let other packaging teams (Debian, Homebrew, conda-forge, etc.) take it from there

Let me reiterate that the work that is being done here by "other packaging teams" is integration. That integration is not done by PyPI in any meaningful way, and part of what results is the dependency clobbering that you dismiss as "will be fixed one day." Another part of it is any library loading disasters that result from library load order. In an ideal world, auditwheel and machomachomangler take care of things like this, but is this an ideal world?

  1. release to PyPI and conda-forge in parallel myself

Given the bot, this is ideally little work once it is set up. I say "ideally" in the same sense as above with auditwheel and machomachomangler.

  1. release to PyPI and my own conda channel in parallel myself; let conda-forge sync with my channel somehow

This is assuming "my own conda channel" and conda-forge are readily sync-able. The hitch here is that "my own conda channel" may take convenient shortcuts, such as using a newer base system (newer glibc) that makes things easier to build, but also means that conda-forge either needs to do massive work, or that they can't sync.

  1. release to one conda channel only, and let conda-press produce wheels from that that I then upload to PyPI (long-term even nicer, only one build toolchain)

conda-press is ignoring pretty large issues like the C++ ABI difference and the fact that conda packages depend on a shipped libstdc++, not the system. I'd love to see it bridge those gaps, but I fear the fundamental gap between the conda standard compiling approach and the PyPA standard compiling approach may be too much to bridge in all cases.

As a ecosystem-wide contributor, I want to be able to tell users how to easily install and use large parts of the NumPy/PyData ecosystem. Ideally this is something like "download [Ana|Mini]conda, open your IDE of choice and work through these tutorials" followed by "if something is not in the defaults channel, do X". This is harder today than it was 3 years ago ...

Seriously? In the vast majority of cases, conda-forge and defaults are plenty. It's not 100%, and I agree that the edge cases have gotten harder, but this is not an accurate statement.

Would other channels like pytorch be open to using the set of conventions that conda-forge use?

Conventions include the toolchain, such as glibc. pytorch may not be able to adopt these conventions if they have a fundamental need that disagrees with the conda-forge stack. At that point, it becomes a push for either changing the conda-forge toolchain stack (which is in effect implicitly changing defaults' toolchain as well, because we try to stay compatible). This has effects on where conda-forge packages can be safely assumed to run, which you are well aware.

Socially, how does the conda(-forge) community want to be seen and interacted with by projects and maintainers? Like PyPI/pip/wheels, or like Debian?

Conda-forge is all about distributed control. We don't have a central team of integrators. If more feedstocks were maintained by official project leaders, I think conda-forge and the user community would be thrilled. I very much understand if project maintainers just don't want to deal with it, though, and that's where the fallback to a Debian-like model happens.

There are more issues than (1-4 above) with channels. I've brought some up before and @msarahan told me "you are not factoring channels into your thinking enough", but I've never seen a real answer to how channels are supposed to work and how the whole "conda design" fits together. Practical advice has changed over the years (with e.g. --strict-channel-priority and putting conda-forge first in .condarc yes/no).

Channels are the notion of spaces where a coherent team dictates behavior. That team ideally uses a consistent toolchain across all packages in that channel. Package names are consistent within that channel. It is a technical solution to an arguably social problem - lining up practices, versions, and toolchains.

But is there a holistic design or long-term vision that the conda and conda-forge teams share?

Try to maintain binary compatibility with each other while providing both backwards compatibility for the large user base stuck on old enterprise OS and capturing current compute capabilities? We don't get to plan the best approach to that. Things like pytorch are forcing functions for re-evaluation of our toolchain, but it is always in competition with dropping support for old-but-not-EoL platforms.

Conda 4.7's virtual package support might help with this, in that it will allow conda to make solver decisions based on the system state. Currently, it can be used for the cuda driver version installed on the system. It could also be used for the glibc present on the system, and then pytorch could require a particular value for that (or better, the toolchain used could impose that dependency automatically). This allows pytorch to use a newer toolchain while not imposing it on the rest of the channel. This kind of exception to the community standard would probably need some official process, though, or else the channel loses coherency quickly.

Packaging is hard, and there's limited energy/expertise. I want to package things once (like (4) above), and at most twice. With PyTorch it's done four times now: the PyTorch team does PyPI and their conda channel, then there's this feedstock, and pytorch in defaults. Also affects other hard-to-build projects - e.g. we still don't have a conda-forge SciPy package for Windows .....

Keep in mind, they are all targeting different runtime environments. The number of builds can't be any less than the number of different runtime environments.

@rgommers
Copy link

rgommers commented Sep 7, 2020

This is certainly not the right approach. I don't see why pytorch is special. We should just build them on conda-forge.

You don't even have GPU build hardware, right? There are more reasons, I hope @scopatz can summarize when he gets back; he said the exact same thing as a "conda-forge first principles" type response, but I believe I managed to convince him.

@isuruf
Copy link
Member

isuruf commented Sep 7, 2020

You don't even have GPU build hardware, right?

We have a docker image with the compilers. Hardware is not needed to build AFAIK. After building the package, we can upload to a testing label and then move the package to main after doing testing on a local machine with the hardware.

@hmaarrfk
Copy link
Contributor Author

hmaarrfk commented Sep 7, 2020

I mean, the fact that Anaconda has already published conda packages to https://anaconda.org/anaconda/pytorch about 6 months ago mostly invalidates whatever arguments we wish to have about control.

We are quite similar to defaults since a recent sync, so I think it is reasonable to ask that we collaborate instead of diverge.

@hmaarrfk
Copy link
Contributor Author

hmaarrfk commented Sep 7, 2020

And for reference, here is a pointer to the installation instructions of the pytorch family package i was talking about
https://github.com/pytorch/fairseq#requirements-and-installation

I understand there isn't always an immediate business case (at Facebook or Continuum) to create a high quality package for everything, which is where conda-forge comes in.

@rgommers
Copy link

rgommers commented Sep 7, 2020

We have a docker image with the compilers. Hardware is not needed to build AFAIK. After building the package, we can upload to a testing label and then move the package to main after doing testing on a local machine with the hardware.

Just doing some manual testing seems like a recipe for broken packages. And you probably won't be able to test everything that way (e.g. multi-GPU stuff with torch.distributed). The battery of tests for PyTorch with various hardware and build configs is very large, and it's very common to have just some things break that you never saw locally.

And for reference, here is a pointer to the installation instructions of the pytorch family package i was talking about
https://github.com/pytorch/fairseq#requirements-and-installation

That's one package, and it has a "help wanted" issue for a conda package: facebookresearch/fairseq#1717. Contributing there and getting a first conda package into the pytorch channel seems like a much better idea than doing your own thing. You can then also use the CI system, so you can test the builds, and I'd imagine you get review/help from the fairseq maintainers.

@isuruf
Copy link
Member

isuruf commented Sep 7, 2020

Just doing some manual testing seems like a recipe for broken packages. And you probably won't be able to test everything that way (e.g. multi-GPU stuff with torch.distributed). The battery of tests for PyTorch with various hardware and build configs is very large, and it's very common to have just some things break that you never saw locally.

How is this different from other packages like numpy, openblas, etc.?

@rgommers
Copy link

rgommers commented Sep 7, 2020

How is this different from other packages like numpy, openblas, etc.?

For NumPy you actually run the tests. Examples:

Plus the number of ways to build NumPy is far smaller than with PyTorch (e.g., check the number of USE_xxx env vars in PyTorch's setup.py). So, it's very different.

@isuruf
Copy link
Member

isuruf commented Sep 7, 2020

I'm suggesting we run the tests on a local machine with GPU hardware.

We don't test all the code paths in numpy. For eg: there's AVX512 code paths that we don't test. We don't test POWER9 code paths. It's impossible to test all code paths.

@isuruf
Copy link
Member

isuruf commented Sep 7, 2020

Plus the number of ways to build NumPy is far smaller than with PyTorch

There are lots of different ways to build openblas. See how many options we set in https://github.com/conda-forge/openblas-feedstock/blob/master/recipe/build.sh#L26-L50

@h-vetinari
Copy link
Member

I have to agree that local testing is a poor substitute for a proper CI matrix, but of course that's not possible without a CI having GPUs, see conda-forge/conda-forge.github.io#1062 - considering the impact conda-forge is having on the scientific computing stack in python, one would hope this should be a tractable problem... (note the OP of that issue; it might be possible to hook in self-hosted machines into the regular azure CI).

With a concerted (and a bit more high-level) effort, I believe that it might be realistically possible to convince Microsoft to sponsor the python packaging ecosystem with some GPU CI time on azure, but admittedly, that's just in my head (based on some loose but very positive interaction I had with their reps).

Re: "build coverage" - 100% might not be possible, but one can get pretty close, depending on the invested CI time. For example, even if we can now have 3-4 different CPU builds per platform/python-version/blas-version (via conda/conda#9930), it's still "only" a question of CI time to multiply the matrix of (e.g.) conda-forge/numpy-feedstock#196 by 3-4. For packages as fundamental as numpy/scipy, this is IMO worth the effort. Pytorch could fall into that category as well.

@isuruf
Copy link
Member

isuruf commented Sep 7, 2020

I have to agree that local testing is a poor substitute for a proper CI matrix

How is it different if we run the tests in CI or locally before uploading to main label?

@h-vetinari
Copy link
Member

How is it different if we run the tests in CI or locally before uploading to main label?

Reproducing a full matrix of combinations locally (different Arches/OSes/python version/GPUs/CPUs/etc.) is not fundamentally impossible to do locally (I just said "poor substitute"), but would take a huge amount of time (incl. complicated virtualization setup for other OSes/Arches), and be error-prone & intransparent, compared to CI builds that run in parallel and can easily be inspected.

@isuruf
Copy link
Member

isuruf commented Sep 7, 2020

Can we please stay on topic? @rgommers wants to copy binaries from pytorch channel which is not definitely not transparent nor can it be easily inspected.

@isuruf
Copy link
Member

isuruf commented Sep 7, 2020

If anyone wants to talk more on this, please come to a core meeting.

@h-vetinari
Copy link
Member

I'm all for building in conda-forge BTW, just saying that I can see the argument why this shouldn't come at the cost of reduced (GPU-)CI coverage (and hence bringing up the GPU-in-CF-CI thing, which would enable to kill both birds with one stone).

@hadim
Copy link
Member

hadim commented Nov 6, 2020

For what it's worth I am also interested in pytorch on conda-forge (with cuda and no-cuda support). In addition to all the advantages cited above, it would allow compiling against pytorch for conda packages.

Copying binaries is fine for me (I am being pragmatic here) but as probably everyone here I would largely prefer to have those packages directly build on conda forge.

@hmaarrfk
Copy link
Contributor Author

For those of you involved in packaging pytorch, we are interested in pushing
#22

through with the package name pytorch creating direct competition with pytorch's advertised pytorch package on their website.

Now that conda-forge supports GPUs, I Think it is safe for us to do so.

If there are any other reasons that should be brought up at this stage, please let us know in the PR .

Thanks for all your input so far!

@rgommers
Copy link

Now that conda-forge supports GPUs

Is the current status documented somewhere? I found conda-forge/conda-forge.github.io#901 as the TODO item to write docs, maybe there's something else?

@hmaarrfk
Copy link
Contributor Author

the pull request #22 is probably the best current documentation on how to use it :D

@seemethere
Copy link

Hello! I'm from the release engineering team for PyTorch. Please let us know if there's any way we can assist in making the conda-forge installation experience for pytorch as smooth as possible.

cc @malfet

@isuruf
Copy link
Member

isuruf commented Dec 15, 2020

@seemethere, thanks for the offer. One task you could help with is a way to collect the licenses/copyright notices of the third party dependencies to comply with their license terms.

@henryiii
Copy link

There's not a lot of documentation AFAIK, but https://github.com/conda-forge/goofit-split-feedstock/blob/master/recipe/meta.yaml is an example of a split GPU / CPU package.

@isuruf
Copy link
Member

isuruf commented Jan 17, 2021

@seemethere, thanks for the offer. One task you could help with is a way to collect the licenses/copyright notices of the third party dependencies to comply with their license terms.

any updates on this?

@hmaarrfk
Copy link
Contributor Author

Closing this issue as the original issue has been resolved.

I opened #34 to discuss licensing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests