Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] fetch thrust/cub from github #5315

Merged
merged 19 commits into from
Jun 5, 2020

Conversation

rongou
Copy link
Contributor

@rongou rongou commented May 29, 2020

Use FetchContent to obtain thrust/cub release 1.9.0. This is needed to get per-thread default stream to work properly. Counterpart to rapidsai/rmm#378.

Closes #5346

@harrism @jrhemstad @mt-jones

@rongou rongou requested review from a team as code owners May 29, 2020 01:35
@rongou rongou requested review from cwharris and rgsl888prabhu May 29, 2020 01:35
@GPUtester
Copy link
Collaborator

Please update the changelog in order to start CI tests.

View the gpuCI docs here.

@rongou
Copy link
Contributor Author

rongou commented May 29, 2020

@harrism @jrhemstad these errors seem to be caused by gcc-5. Building with gcc-7 works, at least for me.

@kkraus14
Copy link
Collaborator

@harrism @jrhemstad these errors seem to be caused by gcc-5. Building with gcc-7 works, at least for me.

We can't just remove gcc-5 support as that's the default compiler shipped with Ubuntu 16.04 for example.

@rongou
Copy link
Contributor Author

rongou commented May 29, 2020

Let me dig through the code. Maybe we can do something on our end to fix this.

@harrism
Copy link
Member

harrism commented Jun 1, 2020

Closes #5346

@rongou
Copy link
Contributor Author

rongou commented Jun 3, 2020

rerun tests

@rongou rongou requested a review from a team as a code owner June 4, 2020 05:17
@rongou
Copy link
Contributor Author

rongou commented Jun 4, 2020

@harrism I've been seeing the build timed out error a lot. Is it possible to increase the timeout value?

@kkraus14
Copy link
Collaborator

kkraus14 commented Jun 4, 2020

@harrism I've been seeing the build timed out error a lot. Is it possible to increase the timeout value?

No, that typically means that a webserver or something of the like is being flaky and we don't want a runaway job sitting there fighting with it.

Copy link
Member

@harrism harrism left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing the Cython paths!

Copy link
Collaborator

@kkraus14 kkraus14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes to block merging until we're aligned on behavior regarding if someone comes in with some code built against the toolkit shipped Thrust / CUB version.

@rongou
Copy link
Contributor Author

rongou commented Jun 4, 2020

Requesting changes to block merging until we're aligned on behavior regarding if someone comes in with some code built against the toolkit shipped Thrust / CUB version.

Since thrust/cub are header-only libraries, code built with different thrust/cub versions is insulated from cuDF. Assuming that code is tested in its own environment, as cuDF is tested against the version of thrust/cub here, there shouldn't be a problem as there are no runtime dependencies on different versions.

@rongou rongou requested a review from kkraus14 June 5, 2020 00:13
@kkraus14
Copy link
Collaborator

kkraus14 commented Jun 5, 2020

Since thrust/cub are header-only libraries, code built with different thrust/cub versions is insulated from cuDF. Assuming that code is tested in its own environment, as cuDF is tested against the version of thrust/cub here, there shouldn't be a problem as there are no runtime dependencies on different versions.

This is only true if thrust types aren't exposed in our APIs. What if we have an API that takes a thrust::device_vector in the future and the signature of thrust::device_vector changes between toolkit version and the version we're using? What would be the behavior of a library returning its thrust::device_vector and trying to pass it into a libcudf API that expects to take in a newer version of thrust::device_vector.

@kkraus14 kkraus14 added 3 - Ready for Review Ready for review by team CMake CMake build issue labels Jun 5, 2020
@harrism
Copy link
Member

harrism commented Jun 5, 2020

This is only true if thrust types aren't exposed in our APIs. What if we have an API that takes a thrust::device_vector in the future and the signature of thrust::device_vector changes between toolkit version and the version we're using? What would be the behavior of a library returning its thrust::device_vector and trying to pass it into a libcudf API that expects to take in a newer version of thrust::device_vector.

(We already have such APIs, but they are mostly detail APIs. The single exception is cudf::make_strings_column() but that is arguably a detail API too.)

device_vector is a class template, so the types are specified at compile time. If there is a conflict, I would expect it to result in compile-time errors, because a user can't use any libcudf APIs that take or return device_vector without compiling those calls themselves.

While there could be problems here, I think the same problem would happen with header-only libraries that are NOT part of the CUDA Toolkit. For example if someone linked librmm into an application that also has its own use of spdlog. Or if they used a different implementation of STL. I think your fear is healthy, but I don't think it should stop us from going forward.

@kkraus14
Copy link
Collaborator

kkraus14 commented Jun 5, 2020

Just wanted to make sure that the limitations / possible issues were at least given some thought and discussed. Seems like we're okay with the tradeoff so approved. 😄

@rongou
Copy link
Contributor Author

rongou commented Jun 5, 2020

C++ dependency management certainly can be tricky. At Google the solution is to build everything from source, each library is versioned, even the toolchains are versioned (gcc/clang/nvcc etc.). This is the philosophy behind bazel. But to make it work at scale, you really need a distributed build system, content-addressable object store, distributed CI, etc. Otherwise everyone would be always waiting for the build to finish on their desktop (it already takes some time for cuDF :). Without that kind of infrastructure, I guess all you can do is to be very careful. When you pull in a binary dependency, you need to make sure it's actually compatible.

@kkraus14
Copy link
Collaborator

kkraus14 commented Jun 5, 2020

@mt-jones is working on reworking the CMake build infrastructure across all of the RAPIDS libraries to be able to better control this, but for now we can just be careful 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team CMake CMake build issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[QST] Should we depend on Thrust and CUB from github rather than the CUDA Toolkit?
6 participants