Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Create include folder for C++ API distribution #1129

Merged
merged 60 commits into from
Oct 12, 2019

Conversation

dantegd
Copy link
Member

@dantegd dantegd commented Sep 23, 2019

PR closes #964 and proposes folder structure for C++ API include folder for distribution.

Proposed folder structure is to mirror cuML’s Python API structure (which itself mirror’s Scikit-learn). CuDF has a flat structure, so this PR can be used to discuss what path we would prefer to take. Important observations:

  • This folder structure is only for the include folder (that forms the consumable client C++ API). Should improve clarity and consistency with the consumable Python API.
  • On the other hand I don’t think moving the src folder to the same structure is such a good idea, many algorithms have multiple files (which could even have naming clashes), a good example are dbscan and kmeans which each have quite a few files of their own, which would make a hypothetical cluster folder there be very big unless we introduce even more nested-ness to our folder structure. So I propose we stick to one folder per algo for the src folder, should help devs keep somewhat lean folders, and with the changes to CMakeLists in this PR is transparent to both users and almost entirely transparent to devs.
  • The above is done via a new CUML_INCLUDE_DIR cmake variable that makes it so that our codebase can include the headers in include independent of its structure, which will also minimize code changes if we decide to move any API around in the future or the results of the discussion of this PR are that we prefer a flat include/cuml folder
  • Only a few files form src_prims (i.e. cuml prims) are needed in the API, so those are installed at the INSTALL step of CMake/make. This way the prims still stay header only independent of the algorithms, and the consumable C++ API has the needed contents for consumption.

Tasks:

  • Create include folder and sub folders
  • Move header files to include folders for initial discussion (Most have been moved, enough to start discussion in PR)
  • Update CMakeLists and header includes in C++ code to test that things work
  • Test that install does install all the required cuML and prims headers needed
  • Discuss proposed folder structure
  • Update PR with folder structure discussion results
  • Maybe: Update C++ Apis to be templated like trustworthiness (could be a future issue/PR)
  • Update Python imports and code for changes
  • Test that everything works

@dantegd dantegd changed the title [WIP][DISCUSS] Creat include folder for C++ API distribution [WIP][DISCUSS] Create include folder for C++ API distribution Sep 23, 2019
@teju85
Copy link
Member

teju85 commented Oct 1, 2019

@dantegd never mind my previous request. I think I have a working version after merging the latest of branch-0.11. Will update this PR soon.

@teju85 teju85 changed the title [WIP][DISCUSS] Create include folder for C++ API distribution [REVIEW] Create include folder for C++ API distribution Oct 7, 2019
@@ -68,41 +68,6 @@ bool get_arg(char **begin, char **end, const std::string &arg) {
return false;
}

class cachingDeviceAllocator : public ML::deviceAllocator {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think most changes are just moving around headers... out of curiosity, why do these cachingDeviceAllocator blocks get removed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not mistaken its because they were part of the examples instead of inside cuML C++ library itself

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's true. cachingDeviceAllocator is a wrapper over the corresponding allocator inside cub. We have cub as our dependency anyways. So, made little sense to keep it inside examples. Thus, moved this inside our include folders.

It also helped resolve a cuml -> ml-prims -> cuml circular dependency on header files.

@JohnZed
Copy link
Contributor

JohnZed commented Oct 7, 2019

In general, I like the approach. I can't say I read every single header. My only concern is that sometimes the python and c++ code directories differ arbitrarily, and C++ users now have to understand the python layout. Maybe we should (in a follow on change?) update the locations of C++ source files as well? I'm thinking like randomforest -> ensemble as one clear example but there are many others.

@dantegd
Copy link
Member Author

dantegd commented Oct 7, 2019

@JohnZed the problem with that and why I don't like the idea for the source files is that then folders can contain way more files in the source folders. For example cluster would include all the kmeans alongside all the dbscan folders, which would either require added folder sublevels (not ideal) or just living with mixed sources of different algorithms (even less ideal). One folder per algorithm source seems like the best way to avoid this mixing (with perhaps some exceptions). On the other hand where to put the header can be as simple as just asking one of the Python devs, or creating a first PR with the proposed API in its location which will then get feedback by us, which is a nice practice unto itself tbh

@dantegd dantegd added the 0 - Blocked Cannot progress due to external reasons label Oct 7, 2019
@dantegd
Copy link
Member Author

dantegd commented Oct 7, 2019

Note: PR will require new libcumlprims 0.11 nightly that is not yet available

@mike-wendt
Copy link
Contributor

@dantegd @teju85 This is no-longer blocked, packages are published: https://anaconda.org/rapidsai-nightly/libcumlprims/files

@dantegd dantegd removed the 0 - Blocked Cannot progress due to external reasons label Oct 10, 2019
@dantegd
Copy link
Member Author

dantegd commented Oct 10, 2019

rerun tests

@dantegd dantegd marked this pull request as ready for review October 10, 2019 21:07
@teju85
Copy link
Member

teju85 commented Oct 11, 2019

Thanks @mike-wendt !

Copy link
Contributor

@JohnZed JohnZed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read over all the utilities and cmake code but only spot-checked the headers + source files.
My comments are minor and address the python utilities. I don't think they are really functional problems, so I'm ok pushing then updating the source (as long as that really happens soon! ;))

@dantegd dantegd merged commit 494cf0b into rapidsai:branch-0.11 Oct 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2 - In Progress Currenty a work in progress Build or Dep Issues related to building the code or dependencies conda conda issue CUDA / C++ CUDA issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Create include folder in cpp to build libcuml API header files for distribution
5 participants