-
-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU Support #7
Comments
Probably a few things. Here are some thoughts based on things we have been working on to get GPU packages to build.
The next thing we need to figure out is how to test the packages. There has been some good discussion and some investigation into possible options. Still more to do here though. |
@soumith, do you have any thoughts on this? 🙂 |
@jakirkham 's plan sounds about right. The PyTorch official conda binaries in the |
The recipes used to build the My understanding is that PyTorch does dynamic loading of the CUDA libraries and therefore the package build with GPU support will work on system without a GPU. A CPU only variant would still be a nice addition since the GPU variant is a large download and requires the |
I think we are now in a good place to try building a GPU enabled pytorch package in conda-forge. Happy to give this a go if that sounds reasonable. 🙂 |
The CPU builds are timing out on windows :(
…On Mon, Sep 30, 2019, 8:18 AM jakirkham ***@***.***> wrote:
I think we are now in a good place to try building a GPU enabled pytorch
package in conda-forge. Happy to give this a go if that sounds reasonable.
🙂
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#7?email_source=notifications&email_token=AAAV7GDQ4GYER6EEILH7RFTQMIKEPA5CNFSM4IADX5PKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD76ASIY#issuecomment-536611107>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAAV7GCEDERATU34Y7OGCCDQMIKEPANCNFSM4IADX5PA>
.
|
@jakirkham why would you like gpu-built pytorch in conda-forge? We already provide high-quality packages in the pytorch channel and I am really worried about support. Like, whenever a new pytorch version releases, the conda-forge one will be a bit behind and then there will be all kinds of conflicts. I'm putting into context the conversation on |
@soumith, thanks for keeping up with the conversation. XREF: conda-forge/torchvision-feedstock#2 Adding pytorch to conda-forge has two advantages:
I agree that uploading the torchvision package was likely a mistake before the pytorch package was in place. For users that depend on torchvision, I think the 0.2 package version is As for being behind on the builds, part of that is that the conda-forge infrastructure isn't setup to automatically detect new tags on github. The tar balls uploaded/generated by your team to github do not contain the 3rd party libraries. In order to build everything, I had to use the git repo. I think I can add the tar ball to trick the updater into automatically rebuilding, but at this point, the azure machines just ran out of RAM..... I might have to inspire mysrlf from your work: https://github.com/pytorch/builder/tree/master/conda Finally, the last advantage is that conda-forge is also looking beyond x86 architectures, and using the conda-forge platform would enable a pathway toward arm/ppc builds (though they are blocked on the graphics stack for now). |
From the perspective of another maintainer of GPU packages that reside in a different channel than conda-forge, it makes dependency management and end user experience much nicer / easier when they have a one stop shop to get their packages. From what I've seen users typically don't modify their |
They add channels to individual install commands because as maintainers, we
often want to give them a single command , on one line, to execute.
I'm definitely guilty of this, especially for pure python packages, when I
have a feeling that users don't want global changes to their environments.
…On Mon, Sep 30, 2019, 3:56 PM Keith Kraus ***@***.***> wrote:
@jakirkham <https://github.com/jakirkham> why would you like gpu-built
pytorch in conda-forge? We already provide high-quality packages in the
pytorch channel and I am really worried about support. Like, whenever a new
pytorch version releases, the conda-forge one will be a bit behind and then
there will be all kinds of conflicts. I'm putting into context the
conversation on torchvision conda-forge repo that happened yesterday.
From the perspective of another maintainer of GPU packages that reside in
a different channel than conda-forge, it makes dependency management and
end user experience much nicer / easier when they have a one stop shop to
get their packages. From what I've seen users typically don't modify their
.condarc file, and just add channels to individual install commands, and
then things get unexpectedly downgraded / upgraded and the end user has a
bad time.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#7?email_source=notifications&email_token=AAAV7GGESYOS4PQZNKKLA6TQMJ73BA5CNFSM4IADX5PKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD77LFVI#issuecomment-536785621>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAAV7GFHKCQ4VXIDQZWMIMTQMJ73BANCNFSM4IADX5PA>
.
|
This issue and the related discussion in conda-forge/torchvision-feedstock#2 do point out some real issues with the conda/conda-forge model. Questions like:
Note, I know this is probably not the optimal place to discuss this, and neither is Twitter (Cc @mrocklin and @jph00). But what is? I honestly don't know the answers to any of these questions, and that's pretty telling given that I've been involved in packaging for a long time and am a maintainer on the NumPy and SciPy feedstocks. I've just scanned through the conda-forge docs again, and it doesn't provide answers.
This is a rule that's more social than a hard technical requirement.
Again social. The
This is true, maybe, sometimes. Package maintainers, with a few notable exceptions (Arrow, Tensorflow 1.x EDIT: not even true anymore for Arrow it looks like, 0.14.1 has binary wheels. Tensorflow 2.0 also has compliant manylinux wheels now), do make this work on PyPI so there's no real reason it couldn't be made to work cross-channel within conda given the right set of conventions/specs/tools. |
Mixing channels has not worked out so well historically, which is why we now have It would be interesting to see how some "cross-compatible" channels would look like (or how that could ever be enforced in a way that gets the blessing of conda/conda-forge), but while it is mostly a social convention (as @rgommers mentions), there is a big impact of channels for corporate environments, where the rest of the internet is usually behind a proxy. Getting anything other than the main channels + conda-forge past IT / sysadmins / etc. is a hassle, both procedurally and technically, so every channel has a substantial incremental cost, while conda-forge just works (after the initial setup).
CC @conda-forge/core @mingwandroid Edit: Probably best to start (at least) at Ralf's post. |
pip doesn't respect version constraints of already installed packages which makes it easy to break environments. Who sets these right set of conventions? Even defaults and conda-forge can't agree on conventions.
It's up to the maintainers.
There are some people who do maintain their feedstocks outside of conda-forge. conda-smithy supports creating feedstocks and uploading to a custom channel.
No. Packages can be in other channels, but cross-channel dependencies means we lose control. conda-forge's community does a lot of work to keep the ABI compatibility and the ability to create consistent environments. @conda-forge/core is called for help very frequently in merging some PR in a feedstock that has been abandoned. |
that's not really relevant here (and will be solved at some point)
This will need solving one way or another I think.
PyTorch relies on MKL features beyond plain BLAS, like fft related functionality. So the BLAS-switching isn't relevant here. A
That's not a real answer. The point is: project maintainers may very well be willing to help, but it's not clear how. The conda-forge team/community needs to have a clear vision here. Right now the norm for releasing any project is: release on PyPI as wheels and sdist, then let someone else worry about conda-forge (that someone else could be an individual project maintainer, or a user, or a conda-forge/core member - but it's not the project release manager normally). The norm could change to having releases to conda-forge, or to a custom conda channel, be part of the official project release procedure.
This sounds like there could be part of an answer in here ....
That's missing the point - this |
I'm not sure I understand. What do the maintainers want to do? Custom channel or in conda-forge? |
Probably best to start (at least) at Ralf's post. I've edited my post accordingly. |
Okay. How about the fact that pytorch conda package in pytorch channel require GLIBC 2.17 (centos7) and conda-forge uses GLIBC 2.12 (centos6). Everything boils down to the set of conventions used. Would other channels like pytorch be open to using the set of conventions that conda-forge use? |
The fact that conda-forge uses GLIBC 2.12 (centos6) is a show stopper for recipes that want to install the latest cudatoolkit (10.1 currently) that installer requires GLIBC 2.14 or newer. For instance, the PR conda-forge/cudatoolkit-dev-feedstock#16 is held back because of this and I would actually hope that conda-forge would start to use a newer GLIBC rather than other channels using the old GLIBC. |
@pearu, can you open a different issue for increasing GLIBC version? (For others who are curious, latest cudatoolkit does not require GLIBC 2.14 as mentioned in the documentation at https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html. Linked PR was just using a wrong URL.) (Off-topic: there are people interested in making manylinux2010 (centos6) wheels from conda packages and increasing the GLIBC version would stop that.) |
Hi all, thanks for this discussion. Let me try to provide my experience as someone that contributes since 4 years to the conda ecosystem and has migrated (with a lot of help) more than 1000 packages to conda-forge.
On conda-forge or a community (and here I mean broader scientific fields and not project-wide-communities) channel. I would always recommend conda-forge only for multiple reasons, but the most obvious one is that you want your users to use as less channels as possible. It is never good to have too many channels/ppa/... activated.
They do! I know a lot of upstream maintainers that do that and I would recommend that always. What is happening on staged-recipes is that upstream maintainers are very often pinged and ask if they want to co-maintain the feedstock. Imho this works very well.
Everything that is stable, yes, why not. Keep in mind to keep the channels as low as possible.
For unstable stuff, for training, for beta-version, for people that don't like to play with communities or don't recognize the complexity of integration. It's a matter of providing freedom and trying different community models (bioconda vs. conda-forge etc.). It's good to have this choice. As a matter of fact the conda-forge model (even if not perfect) is the yet most scalable approach that we have seen. And I'm a Bioconda core member that is speaking here. Please keep in mind that multiple channels are always more complicated to maintain as just one. An other example is name-space clashes. You have this way better under control in one channel than in 10 channels.
We can. But we need to play together. Bioconda, a channel with 7000 Bioinformatic packages, is depending on conda-forge. We recommend the channel order conda-forge > biodonda. We do have linters in conda-forge and bioconda that prevent name-clashes. Bioconda members are part of conda-forge, we do agree on the same glibc version, we sync the pinnings - we (Bioconda) essentially following conda-forge and discuss how conda-forge evolves with them together as we depend on them. But, there needs to be the will to work together and invest time and effort to make this happen.
Not sure what you are referring to, but as someone that runs >1000 environments since 4 years on all kind of infrastructure from HPC to Cloud and as a maintainer of BioContainers (building containers out of Conda packages) ... conda is scalable. The most scalable package manager that I have seen so far ... yes even more scalable than dpkg and such. But this also means it's way more complex. If you acknowlegde that and if you have seen how this community can maintain >10 languages, > 1000 R packages (that are rebuild all 6 month), that we have rebuild everything against new compilers, that if boost or any other library (as zlib) gets a new version, all dependent packages are rebuild (also in Bioconda) than you would probably also say that conda-forge is scalable :)
That is exacly what @isuruf was trying to say. It is some kind of agreement that many hundret of people have taken and there are valid reasons for sticking to GLIBC 2.12. For example that all HPC environments I know are running CentOS6 (or similar systems) and this will not go away soon. Gosh I have seen HPC with CentOS5 a year ago :( In the end my key-points are:
Thanks again for starting this discussion, happy to answer any question also in regard to Bioconda and how we maintain a separate channel but stay compatible with conda-forge ❤️ |
It's important, but not the right place to start. Needed conventions follow from goals/requirements that affect users/projects/maintainers/redistributors/etc. So I'll start there. As a user, I want to be able to use As a maintainer, I want to make my package available to all
Right now, NumPy, SciPy and the majority of packages do (1). PyTorch does the first part of (3), but conda-forge doesn't "sync" correctly. As a ecosystem-wide contributor, I want to be able to tell users how to easily install and use large parts of the NumPy/PyData ecosystem. Ideally this is something like "download [Ana|Mini]conda, open your IDE of choice and work through these tutorials" followed by "if something is not in the defaults channel, do X". This is harder today than it was 3 years ago ...
Note that I'm not a PyTorch maintainer (although I am contributing) so I won't try to answer that for @soumith. I believe this problem isn't really PyTorch-specific though. Some other thoughts:
I'll close with echoing @jph00's sentiment on Twitter: anyway, I'll close by saying I really don't like bringing up negative issues like this, because it's stressful and tiring. I only do so about things I really care about and want to be successful. Like conda. |
Thanks for the insights @bgruening
You can only when it's one-way right? I mean, nothing in |
I meant specifically that the conda resolver speed problems depend on the size of the graph. It's still quite easy to run into this even with the nice improvements in conda 4.7.x. So if everything needs to be in conda-forge and the number of packages becomes of the same order as that on PyPI, that may not work. So from that perspective, "everything in conda-forge" seems quite unhealthy. Having channels interact well, like bioconda and conda-forge apparently do, may be much better. |
Thanks everyone that jumped into this discussion and shared your thoughts. I think this has been extremely valuable. The next step would be to raise some well-scoped issues on the webpage repo for further discussion and resolution. @rgommers, are you happy to do this? 🙂 |
To your question @soumith (though others have offered some great answers too! 😄)
Sorry I read your previous comment as stating this plan was ok. Was this not what you meant? Or have you changed your mind? In either case, it seems that various people have pushed to add the pytorch stack to conda-forge. Now we actually have several downstream packages in conda-forge that require pytorch. However what I'm hearing is the user experience is not very good due to the lack of GPU support. Removal would complicate the story for downstream packages that need pytorch. So it seems like the best course of action would be to make sure we have a fully featured pytorch package in conda-forge. As to maintenance effort, I suspect (though maybe @jjhelmus can comment 😉) that If you have particular thoughts on how a conda-forge pytorch package can be built, we would appreciate hearing them and would happily incorporate this feedback. In turn if you'd like to continue doing your own build, you can use the recipe we work on together. Alternatively you could reuse the binaries we produce (after whatever validation seems appropriate to you) or encourage users to get the package from conda-forge. In any event, I'd hope you could benefit from this shared effort. Thoughts? 🙂 |
Thanks @jakirkham, yes I'll do my best to break this up and create actionable issues. It may take me a little while .... |
Is anybody today depending on pytorch from conda-forge? This package is called pytorch-cpu explicitly to give us time to experiment compiling such a large package without giving users or maintainers the false sense that they are installing a GPU compatible package. |
Let me reiterate that the work that is being done here by "other packaging teams" is integration. That integration is not done by PyPI in any meaningful way, and part of what results is the dependency clobbering that you dismiss as "will be fixed one day." Another part of it is any library loading disasters that result from library load order. In an ideal world, auditwheel and machomachomangler take care of things like this, but is this an ideal world?
Given the bot, this is ideally little work once it is set up. I say "ideally" in the same sense as above with auditwheel and machomachomangler.
This is assuming "my own conda channel" and conda-forge are readily sync-able. The hitch here is that "my own conda channel" may take convenient shortcuts, such as using a newer base system (newer glibc) that makes things easier to build, but also means that conda-forge either needs to do massive work, or that they can't sync.
conda-press is ignoring pretty large issues like the C++ ABI difference and the fact that conda packages depend on a shipped libstdc++, not the system. I'd love to see it bridge those gaps, but I fear the fundamental gap between the conda standard compiling approach and the PyPA standard compiling approach may be too much to bridge in all cases.
Seriously? In the vast majority of cases, conda-forge and defaults are plenty. It's not 100%, and I agree that the edge cases have gotten harder, but this is not an accurate statement.
Conventions include the toolchain, such as glibc. pytorch may not be able to adopt these conventions if they have a fundamental need that disagrees with the conda-forge stack. At that point, it becomes a push for either changing the conda-forge toolchain stack (which is in effect implicitly changing defaults' toolchain as well, because we try to stay compatible). This has effects on where conda-forge packages can be safely assumed to run, which you are well aware.
Conda-forge is all about distributed control. We don't have a central team of integrators. If more feedstocks were maintained by official project leaders, I think conda-forge and the user community would be thrilled. I very much understand if project maintainers just don't want to deal with it, though, and that's where the fallback to a Debian-like model happens.
Channels are the notion of spaces where a coherent team dictates behavior. That team ideally uses a consistent toolchain across all packages in that channel. Package names are consistent within that channel. It is a technical solution to an arguably social problem - lining up practices, versions, and toolchains.
Try to maintain binary compatibility with each other while providing both backwards compatibility for the large user base stuck on old enterprise OS and capturing current compute capabilities? We don't get to plan the best approach to that. Things like pytorch are forcing functions for re-evaluation of our toolchain, but it is always in competition with dropping support for old-but-not-EoL platforms. Conda 4.7's virtual package support might help with this, in that it will allow conda to make solver decisions based on the system state. Currently, it can be used for the cuda driver version installed on the system. It could also be used for the glibc present on the system, and then pytorch could require a particular value for that (or better, the toolchain used could impose that dependency automatically). This allows pytorch to use a newer toolchain while not imposing it on the rest of the channel. This kind of exception to the community standard would probably need some official process, though, or else the channel loses coherency quickly.
Keep in mind, they are all targeting different runtime environments. The number of builds can't be any less than the number of different runtime environments. |
You don't even have GPU build hardware, right? There are more reasons, I hope @scopatz can summarize when he gets back; he said the exact same thing as a "conda-forge first principles" type response, but I believe I managed to convince him. |
We have a docker image with the compilers. Hardware is not needed to build AFAIK. After building the package, we can upload to a testing label and then move the package to |
I mean, the fact that Anaconda has already published conda packages to https://anaconda.org/anaconda/pytorch about 6 months ago mostly invalidates whatever arguments we wish to have about control. We are quite similar to defaults since a recent sync, so I think it is reasonable to ask that we collaborate instead of diverge. |
And for reference, here is a pointer to the installation instructions of the pytorch family package i was talking about I understand there isn't always an immediate business case (at Facebook or Continuum) to create a high quality package for everything, which is where conda-forge comes in. |
Just doing some manual testing seems like a recipe for broken packages. And you probably won't be able to test everything that way (e.g. multi-GPU stuff with
That's one package, and it has a "help wanted" issue for a conda package: facebookresearch/fairseq#1717. Contributing there and getting a first conda package into the |
How is this different from other packages like |
For NumPy you actually run the tests. Examples:
Plus the number of ways to build NumPy is far smaller than with PyTorch (e.g., check the number of |
I'm suggesting we run the tests on a local machine with GPU hardware. We don't test all the code paths in numpy. For eg: there's AVX512 code paths that we don't test. We don't test POWER9 code paths. It's impossible to test all code paths. |
There are lots of different ways to build openblas. See how many options we set in https://github.com/conda-forge/openblas-feedstock/blob/master/recipe/build.sh#L26-L50 |
I have to agree that local testing is a poor substitute for a proper CI matrix, but of course that's not possible without a CI having GPUs, see conda-forge/conda-forge.github.io#1062 - considering the impact conda-forge is having on the scientific computing stack in python, one would hope this should be a tractable problem... (note the OP of that issue; it might be possible to hook in self-hosted machines into the regular azure CI). With a concerted (and a bit more high-level) effort, I believe that it might be realistically possible to convince Microsoft to sponsor the python packaging ecosystem with some GPU CI time on azure, but admittedly, that's just in my head (based on some loose but very positive interaction I had with their reps). Re: "build coverage" - 100% might not be possible, but one can get pretty close, depending on the invested CI time. For example, even if we can now have 3-4 different CPU builds per platform/python-version/blas-version (via conda/conda#9930), it's still "only" a question of CI time to multiply the matrix of (e.g.) conda-forge/numpy-feedstock#196 by 3-4. For packages as fundamental as numpy/scipy, this is IMO worth the effort. Pytorch could fall into that category as well. |
How is it different if we run the tests in CI or locally before uploading to |
Reproducing a full matrix of combinations locally (different Arches/OSes/python version/GPUs/CPUs/etc.) is not fundamentally impossible to do locally (I just said "poor substitute"), but would take a huge amount of time (incl. complicated virtualization setup for other OSes/Arches), and be error-prone & intransparent, compared to CI builds that run in parallel and can easily be inspected. |
Can we please stay on topic? @rgommers wants to copy binaries from |
If anyone wants to talk more on this, please come to a core meeting. |
I'm all for building in conda-forge BTW, just saying that I can see the argument why this shouldn't come at the cost of reduced (GPU-)CI coverage (and hence bringing up the GPU-in-CF-CI thing, which would enable to kill both birds with one stone). |
For what it's worth I am also interested in pytorch on conda-forge (with cuda and no-cuda support). In addition to all the advantages cited above, it would allow compiling against pytorch for conda packages. Copying binaries is fine for me (I am being pragmatic here) but as probably everyone here I would largely prefer to have those packages directly build on conda forge. |
For those of you involved in packaging pytorch, we are interested in pushing through with the package name Now that conda-forge supports GPUs, I Think it is safe for us to do so. If there are any other reasons that should be brought up at this stage, please let us know in the PR . Thanks for all your input so far! |
Is the current status documented somewhere? I found conda-forge/conda-forge.github.io#901 as the TODO item to write docs, maybe there's something else? |
the pull request #22 is probably the best current documentation on how to use it :D |
Hello! I'm from the release engineering team for PyTorch. Please let us know if there's any way we can assist in making the cc @malfet |
@seemethere, thanks for the offer. One task you could help with is a way to collect the licenses/copyright notices of the third party dependencies to comply with their license terms. |
There's not a lot of documentation AFAIK, but https://github.com/conda-forge/goofit-split-feedstock/blob/master/recipe/meta.yaml is an example of a split GPU / CPU package. |
any updates on this? |
Closing this issue as the original issue has been resolved. I opened #34 to discuss licensing. |
@jjhelmus it seems you were able to build pytorch GPU without needing to have variants
https://anaconda.org/anaconda/pytorch/files?version=1.0.1
Is that true?
If so, what challenges do you see moving this work to conda-forge?
The text was updated successfully, but these errors were encountered: